Skip to main content

Full text of "Naval research logistics quarterly"

See other formats

)S \T0^ 





JUNE 1977 
VOL. 24, NO. 2 


NAVSO P-1278 



Murray A. Geisler 
Logistics Management Institute 


W. H. Marlow 
The George Washington University 

Bruce J. McDonald 
Office of Naval Research 


Seymour M. Selig 

Office of Naval Research 

Arlington, Virginia 22217 


Marvin Denicoff 

Office of Naval Research 

Alan J. Hoffman 
IBM Corporation 

Neal D. Glassman 

Office of Naval Research 

Jack Laderman 

Bronx, New York 

Thomas L. Saaty 

University of Pennsylvania 

Henry Solomon 

The George Washington University 

The Naval Research Logistics Quarterly is devoted to the dissemination of scientific information in logistics and 
will publish research and expository papers, including those in certain areas of mathematics, statistics, and economics, 
relevant to the over-all effort to improve the efficiency and effectiveness of logistics operations. 

Information for Contributors is indicated on inside back cover. 

The Naval Research Logistics Quarterly is published by the Office of Naval Research in the months of March, June, 
September, and December and can be purchased from the Superintendent of Documents, U.S. Government Printing 
Office, Washington, D.C. 20402. Subscription Price: $11.1 5 a year in the U.S. and Canada, $13.95 elsewhere. Cost of 
individual issues may be obtained from the Superintendent of Documents. 

The views and opinions expressed in this Journal are those of the authors and not necessarily those of the Office 

of Naval Research. 

Issuance of this periodical approved in accordance with Department of the Navy Publications and Printing Regulations, 

P-35 (Revised 1-74). 


A. M. Geoffrion 

Graduate School of Management 

University of California, Los Angeles 

Los Angeles, California 


A complete logistical planning model of a firm or public system should include 
activities having to do with the procurement of supplies. Not infrequently, however, 
procurement aspects are difficult tc model because of their relatively complex and 
evanescent nature. This raises the issue of how to build an overall logistics model in 
spite of such difficulties. This paper offers some suggestions toward this end which 
enable the procurement side of a model to be simplified via commodity aggregation 
in a "controlled" way, that is, in such a manner that the modeler can know and 
control in advance of solving his model how much loss of accuracy will be incurred 
for the solutions to the (aggregated) overall model. 


In this paper the term "procurement" is used in a broad sense that includes materials manage- 
ment of raw materials and parts for a manufacturing firm, the acquisition of goods for subsequent 
distribution by a wholesale firm, the procurement of supplies and materials by a service organization 
sj^stem, and similar situations. The essential point is that we are addressing the "initial" rather than 
the "final" stage of a logistics system. See, for instance, the recent book by D. Bowersox [2] which 
makes the distinction in terms of material management (supplier-oriented) and physical distribu- 
tion management (customer-oriented). 

Whereas it is the large number of customers and their ordering idiosyncrasies that tend to 
make the final stage of a logistics system hard to model, it is the large number of suppliers and 
items and sometimes the constantly changing patterns of procurement that frequently make the 
initial stage difficult to model. Aggregation of customers on a geographic basis into customer 
zones and aggregation of delivered products (or services) into product groups are commonly used 
to simplify the final stage of a logistics planning model. Similar aggregations can be used to sim- 
plify the initial stage, but satisfactory simplifications may be more difficult to achieve because 
of the influence of differential supply costs among suppliers and the greater degree of uniqueness 
as to which suppliers provide what. These influences seem to call for a relatively greater amount 

♦This research was partially supported by the National Science Foundation and the Office of Naval Research. 



of detail to be preserved in the procurement stage of a planning model. Unfortunate^, this could 
require the preparation of unduly detailed procurement forecasts — which suppliers will be able 
to supply what items at what prices in what annual quantities. The difficulties of assembling this 
data could be out of proportion to the relative importance of procurement as a component of the 
total logistics planning model. Even worse, it may not be sensible to impose strict model control 
in the traditional linear programming sense over procurement activities at so great a level of detail. 

A reasonable response to these possible difficulties is to take a more flexible attitude toward 
the modeling of procurement than is customary among devotees to mathematical programming. 
Namely, look upon the procurement pattern as an aspect of the problem that is partly given 
objectively and partly under the analyst's control as though it were a policy parameter. View the 
procurement pattern as something whose influence is as much to be understood as it is to be 

The aim of this paper is to provide a rigorous framework within which this flexible modeling 
attitude can be exercised. We are particularly interested in a priori error bounds concerning the 
accuracy of the full logistics planning model as it is influenced by aggregating procurement items. 
So far as we are aware, our results along these lines are without precedent. 

A companion paper [5] develops similar results in the context of customer aggregation. 


As a point of departure, consider the following logistics planning model. 

Planning Model P 

(1) minimize X) c ijk x ijk +F(y, z) 

x, y, z ijk 

(2) subject to S t j<^2 e«*<$«i all ij 

— k 

(3) 22 &«*=S Dt&ki, all i k 

j i 

(4) Sy*«=l,alU 


(5) x ijk >0, alH jk 

(6) Vki>0, all kl and (y, z) efl. 

The following interpretations will be used : 

i=indexes procurement items (raw materials, parts, finished goods, etc.). 

y=indexes geographical procurement zones. 

k = indexes the facilities being supplied. 

Z=indexes customers. 
x ijh =& variable giving the annual amount of item i procured from zone,; for facility k. 
Vki=& variable giving the fraction of the annual needs of customer I (for goods or services) 
satisfied by facility k. 

2= a vector of additional (possibly logistical) variables. 
c oi; =unit cost of procurement plus transportation associated with the flow x i}k . 


F (y, 2)= the total annual costs associated with (y, z) exclusive of procurement and inbound 
transportation (typically, facility-related costs plus outbound transportation 
costs) . 
Sit (Sij) = & lower (upper) limit on the annual amount of item i procured from zone j (partly 
given and partly at the analyst's discretion). 
Du=the amount of item i required to satisfy the total annual needs of customer I. 
fl=a constraint set that must be satisfied by (y, z). 
It is understood that a list L x of allowable triples (i, j, k) is given to reflect which procurement 
zones can provide which items to which facilities, and that all summations and constraint enumera- 
tions run only over allowable combinations. For instance, the enumeration in (2) over "ij" runs 
over the pairs (i, j) such that (i, j, k) eL x for some k. Similarly, a list L v is given which specifies 
which facilities can serve which customers. 

Constraints (2) control the procurement pattern. An historical procurement pattern (or some 
other preconceived pattern) can be enforced by taking corresponding S„ and S {j 's to be the same 
or nearly the same. The latitude for departure from the preconceived pattern increases as S ti — S ti 
increases. A necessary condition for feasibility is 

(7) S Su< X) D u < Z) S<i for all i. 

j ~ 1 j 

The objective function (1) gives the total cost associated with logistical activities. We have 
already discussed (2). Constraints (3) specify that each facility must receive exactly enough of 
each item to satisfy the needs of the customers it serves. This, requires that the goods or services 
demanded by each customer can be converted into corresponding requirements for the constituent 
items (it is immaterial whether the facilities do manufacturing or distribution or service or some 
combination thereof). Constraints (4) specify that the full needs of each customer must be satisfied. 
Constraints (5) and (6) impose whatever other requirements on the variables may be needed for 
system feasibility. 

Observe that for fixed y and 2, the optimization over x separates into independent subproblems 
for each i — each a slight generalization of the classical minimum cost transportation problem. 

Because of the complete generality of F and 0, the model could be set up to determine the least 
cost facility locations satisfying a desired level of customer service. Normally this would require 
that F be discontinuous in order to accommodate fixed costs, or some binary z-variables could be 
introduced to achieve the same effect. The model could also be set up to provide for multiple com- 
modities flowing to customers from the facilities, unique assignment of customers to facilities for 
certain commodities, and many other problem features. We prefer to leave the model in its general 
form (l)-(6) because these and many other special cases are thereby treated simultaneously with 
minimum notational complexity. 

The model as stated is actually just a point of departure for the models we actually wish to 
study. Its chief shortcoming is that it may involve too great a level of detail regarding procurement 
from the viewpoint of policy and also sheer size. Consider first the policy aspect. Model P places 
limits on the procurement pattern (via (2)) on an item-by-item basis. Except for items of major 
importance, this seems like an excessive degree of control and may not even be meaningful in 
situations where suppliers are changed frequently on the basis of current price and availability. It 


would make more sense when there are many items of small importance to aggregate some of the 
constraints in (2) . Suppose this is done for some subset I of items. The result is 

Planning Model Pi 
the same as planning model P, except that (2) is replaced by 

(2.1) S«<S Xijk<S i} , all ij with UI 

(o 2) Si.i<!2 *w < Sr. j, all j, 



(8) S I . i Aj}S u ui&'8 J , t £S>$S ti . 

v ' — ul — itl 

This version seems more reasonable from a policy standpoint in that the procurement pattern for 
items / is now stipulated on an aggregate basis. The numbers S Itj and Sj,j would be interpreted 
rather freely since their formal constituents S^ and S ti might be poorly known or perhaps even 

There is, of course, a natural generalization of P T that aggregates the procurement pattern 
constraints for several subsets of items. The analysis of this generalization is a simple extension of 
the results to be obtained for Pj (see the Remark in Appendix 1). 

Model Pi is better from a policy standpoint but it still may be too large. The number of vari- 
ables is unchanged, although the number of type (2) constraints has diminished. Moreover, a 
possible new difficulty arises in that the mathematical structure of Pi is more complex than that 
of P. This is due to the fact that aggregating the type (2) constraints over iel has the effect of 
coupling together what previously was a collection of independent transportation-like subproblems 
in the x-variables when y and z are fixed. The new coupling tends to' diminish the computational 
effectiveness of solution methods that exploit the natural separation into subproblems when y and 
z are held fixed temporarily (e.g., methods based on Benders decomposition [4]). The nice structure 
of P could be restored, and the size of P 7 much reduced, by completing the aggregation with respect 
to items 1 begun in the passage from P to P 7 . This involves replacement of the variables x iik with 
iel by aggregate variables £ jk , say, so that the following single transportation-like subproblem 
replaces the coupled subproblems of P 7 for fixed y: 

Minimize 2 &j*f# 

*S0 jk 

subject to 

(2-2A) S/.,<2fe*<&.,/aH; 

(3-1) S!*-»33Z><iyi», all k, 


where the b jk 's are plausible surrogates for the c iik 's over iel. Variable £ jk is interpreted as a surro- 
gate for ^2 x ijk , and (3.1) is interpreted as requiring facility k to receive enough of the items in 


I to meet its needs in the aggregate. 



This further aggregation of P 7 leads to 

Planning Model Pi, & 
Minimize X) S c ijk x ijk +J2 Jk b jk Z Jk +F(y, z)+L(y; b) 

x, y, z, i yil jk 


subject to 








where we define 


<S^<S Xtjk<S tj , all ij with til 

— k 

Sr.i<52i;*<Si.,. alii 

— k 

S^*=S #/. iVki, all k 

i i 

S a; i7*=S DuVki, all ik with i/7 

j i 

X) y*i=l, all I 


Xtjk>0, all ij£ with i{I 

£#>0, all $; such that ijk exists for iel 

y k i>0, all H and (y, z) efl, 


and where Z(y; 6) is some linear function of y designed to "compensate" for aggregation error 
in spite of the arbitrary choice of b. 

Notice that the mathematical structure of P />6 is identical to that of P (with the addition to 
the objective function of a new term linear in y, which seems innocuous enough). P />6 is smaller in 
that the items of / have been aggregated together throughout. 

The major task at this point is to understand the relationship between P 7 and P 7 , 6 . Our main 
results in this direction are summarized in the next section. 


As it turns out, a natural choice for the L function exists for which a nearly ideal relationship 
can be established between P 7 and P 7 , &. In particular, an d priori bound can be obtained on the 
difference between their optimal values. Such a bound can be obtained for any choice of b, and in 
fact furnishes a useful criterion for making this choice. 

It will be convenient to refer to the so-called Range function, which is defined for any col- 
lection {oi, . . .,a„} of scalars as 

Range {aj} A Max {a,}— Min {a,}. 

. l<J<n 1<7'<« l<7<n 

The notation v(-) will refer to the optimal value of an optimization problem. 

T,D«y kl 

1 7 

if l*>0 


if |;*=0 


MAIN THEOREM : Assume that the same jk links exist for every item in some subset /. 
Let b jk be chosen arbitrarily for these links, and take the compensation function L to be 

(11) L(y, 6 ) = S (|f D " Min fowMW 


(12) v(P r , b )<v(Pr)<v(P Itb ) + e b , 


(13) € t A XI Max(X) D n Range {c ijk -b jk }}. 

I k Ul j 

Moreover, a complete e 6 -optimal solution of P T can be obtained from any optimal solution (x, y, z , £) 
to P Iib by using (x, y, z) as is and supplementing it by values for the missing x ijk f or i e / according 
to the disaggregation formula: for all ijk with i e I, put 

(14) Xi)k=< 

The proof is given in Appendix 1, along with a generalization to the case where several subsets 
of items are aggregated simultaneously. Extensions which accommodate suboptimal solutions to 
Pi, & are easy to obtain. 

This theorem is a satisfying one in a number of respects. First, it allows for an arbitrary aggre- 
gation set / subject to the requirement that the items involved have a common set of transporta- 
tion links (otherwise feasibility difficulties could be encountered in trying to recover a feasible 
solution to Pi from one of Pi, b ). Second, it allows an arbitrary choice of b, which accommodates 
any heuristic rule that may be appealing in a particular situation (e.g., some weighted mean of 
c m over i e I). Third, it selects L in such a manner that the aggregated problem is a relaxation of 
the original one in a suitable sense, thereby producing an underestimate of the optimal value of the 
original problem. Fourth, this underestimate has an error that is known d priori to be no larger 
than a calculable number e b . Fifth, solving the aggregated problem is guaranteed to furnish a com- 
plete e 6 -optimal solution to P T (one can very likely conclude that this solution is e-optimal in Pi 
for some e smaller than e b - — just take the difference between the objective function (1) evaluated 
at the feasible solution and the lower bound v(Pi, b )). And sixth, the explicit formula for t b has a 
number of valuable applications. We now expand on this last point. 

An important question is how one should select b when a compelling heuristic choice is not 
available. The formula for e b furnishes a natural criterion: select 6 to make t b as small as possible. 
Happily, this can be converted to a linear programming problem by using standard tricks (mainly 
the representation of the maximum of a set of numbers as their least upper bound) . Thus, the optimal 
b can always be calculated by linear programming. 


The eft-minimizing choice of b can sometimes be obtained analytically if additional assumptions 
are imposed. For instance, if the D u 's are proportionally thr same for i in / at every customer — 
i.e., if there exist proportions 

p t , where p t >0 for id and Tp, #,-=1 


such that 

,, 5 \ ^ " =p t for all il with id 


— andy)i„>y^, v, for some i d, then it can be shown that the optimal choice of 6 is to take bjk=Ci„jk 

for all jk. 

It is of interest to characterize the situations where e b =0 is possible. It is shown in Appendix 2 
that a necessary and sufficient condition for e b to equal for some choice of b is that there exist 
numbers f} ik and y ik such that 

(16) c (jk =0 Jk -{-y ik for all ijk with iel and k such that it 

is connected to some I with D it >0. 

If this condition holds, then e 6 =0 is achieved by taking b jk =^ jk for all jk (plus any constant de- 
pending only on k) with k such that it is connected to some I for which 

S D tl >0. 


The choice of b jk is arbitrary for any k's left over. 

When might (16) hold? An important case occurs when item i has a procurement cost y t 
$/unit, and all items in / have the same unit inbound transportation rate when measured on a per 
mile basis, say t 7 $/unit-mile. If the distance from j to k is d Jk) then 

(17) c ijk =t I d jk -\-y i for &\\ijk with id 

and (16) clearly holds. This case admits an easy generalization that still leaves e 6 =0: t t can de- 
pend on j or k or both, and y { can depend on k. 


We have achieved our goal of providing rigorous guidance to the modeler who wishes to con- 
sider aggregating a subset / of items in the procurement portion of a logistics planning model. 
Assuming that the aggregate constraints (2.2) offer adequate control of the procurement pattern, 
the modeler can obtain an d priori bound from (13) on the amount of suboptimality that will be 
caused in the model by subsequently collapsing the inbound flows for i in / down to a single 
transportation-like problem that uses any plausible costs b ik for the aggregated items. It bears 
emphasis that this bound can be calculated before optimizing the aggregated planning model, 
perhaps using rough preliminary data, and hence can be a useful tool for model design. 


The results attained can be us..d ..ot only to study the effects of aggregation with a predeter- 
mined subset / of items, but also to select / itself on the basis of small anticipated aggregation 
error. This can be done by cluster analysis aimed at finding item subsets for which (16) holds ap- 
proximately. One way to proceed is based on the following observation. Notice that if (16) holds 
exactly, then summing over j yields 

Sci>*=S/VHIJ||/y» for ik, 
j j 

where \\j\\i is the number of procurement zones supplying item i. Thus, y a can be eliminated in 
(16) using 

y • c tjk y • Pjk 

7i* = -Ti 

J\\t \\J\\i 

to obtain 

S c «* S&* f° r a ^ ift w i tn i*I an d k 

(16)' e i =p jk — t-. such that it is connected 

ll.yllt Hill* to some I with Du^>0. 

Conversely, (16)' implies that (16) holds. Hence (16) and (16)' are equivalent conditions. The 
obvious clustering approach would be to identify with each item i a linearized vector V* with 
typical entry 

2—l C ijk 

c . ., — I if link i jk exists 

l,k ||j||< 

a large number M otherwise. 

The V'-vectors would then be clustered by some standard technique [1] to discover subsets of 
i for which the V*'s are nearly identical. These subsets of i would identify items which, if aggregated, 
would tend to have small aggregation error when an approrpiate choice for 6 is used. In fact, an 
appropriate choice for b would be a virtual by-product of most standard clustering schemes 

A refinement would be to weight the V v s or its components according to demand or some measure 
of the likelihood that a given link would actually be selected by the model for use. 


[1] Anderberg, Cluster Analysis for Applications (Academic Press, 1973). 

[2] Bowersox, D. J., Logistical Management (Macmillan, 1974). 

[3] Geoffrion, A. M., "Elements of Large-Scale Mathematical Programming," Management 

Science, 16 (11) 652-691 (July 1970). 
[4] Geoffrion, A. M. and G. W. Graves, "Multicommodity Distribution System Design by Benders 

Decomposition," Management Science, 20 (5) 822-844 (January 1974). 
[5] Geoffrion, A. M., "Customer Aggregation in Distribution Modeling," Working Paper No. 259, 

Western Management Sciences Institute, UCLA (October 1976). 
[6] Geoffrion, A. M., "Objective Function Approximations in Mathematical Programming," 

Mathematical Programming, to appear. 



Let v(-) denote the infimal value of any minimizing optimization problem. 
Lemma 1 [6]. Consider the two optimization problems 

(Q) Minimize /(w) subject to wtW 

(Q) Minimize /(w) subject to wtW, 

where / and / are real-valued functions bounded below on a non-empty set W. (Interpret (Q) as 
the "true" problem and (Q) as the "approximating" problem in the sense that an approximate 
objective function j replaces /.) Let e and e be scalars (not necessarily nonnegative) satisfying 




— t<j(w)—j(w) <e for all WeW. 


and any optimal solution w of (Q) is necessarily (e+7)-optimal in (Q). 

Lemma 1 will be applied not to P 7 in the role of (Q), but rather to an equivalent version of 
Pi, namely its "projection" [3] onto the variables y, z, and x with if I: 


where we define 

Make the identifications 



Minimize F(y, 2) +2 c ijk x iJk +<pj(y) 

x,y,z HI 


subject to (2.1), (3.2), (4), (5.1), (6) 

¥>/(y) = I n fimum 2 c w x u* subject to (2.2) and 


2 a; u*=X)-t ) «j2/*i> all ik with i e / 
Zo*>0, all ijk with itl. 

w=the variables of (Pi)* 

W=the constraints of (Pi)* 
/(w)=the objective function of (Pi)* 
f(w)=the objective function of (Pi) with<p 7 replaced by J/, 

Vi(y) is defined as 
Zi(y)AL(y; 6)+Inf. 2 b„t Jk subject to (2.2A) and (3.1) 

£60 jk 

with L as defined in (11) for arbitrary fixed b. The justification for (A4) is provided by 


LEMMA 2: Assume that the same jk links exist for every item in the subset /. Then 

Viiy) <£>/(?/) <£/(?/)+«»> ah" (y, z) satisfying (4) and (6), where e b is defined as in (13). 

Once Lemma 2 is established, conclusion (12) of the Main Theorem is at hand upon applying 
Lemma 1 using the identifications given above and the obvious facts v(Q)=v(P I )*=v(P I ) and 


PROOF OF LEMMA 2: Introduce a supplementary nonnegative variable £ jK into (A3) for each 
jk link in existence for iel, along with the supplementary constraints 


and the supplementary terms b jk £ jk — b jk £ jk in the objective function. From (2.2) we see that addi- 
tional redundant constraints (2.2A) may be added, and from the demand constraints of (A3) we 
see that (3.1) may be added. Clearly none of this alters the infimal value of (A3). Upon "projec- 
tion" of the augmented problem onto the £- variables, one obtains 

(A3)* <^(y)=Infimum S^+iJCfc V) subject to (2.2A), (3.1) 

{60 jk 

where the remainder term is denned as 

subject to 

R(S, y)Alnfimum X) (c«*~ b jk )x ijk 


2 x ijk =^ l D il y kl , all ik with iel 

i i 

Zj 1 !)^^'*! & h jk 
x ijk >0, all ijk with iel. 

It is easy to verify that 

R(y) <R(Z, V) <R(y) for all (y, z) satisfying (4) and (6) 

and £ satisfying (2. 2 A) and (3.1), 

fl(y)^S(X)#<i M ™ {««*-&#)) yu^Uy; b) as defined in (11) 

— kl itl j 

R(y)^T:(T l D il M & x{c m -b jk \)y kl . 

kl itl j 

Since R(y)—R(y) clearly is no larger than 
SMax{^Z? <I [Max{c w -6 > *}-Min{c ii *-6 J *}]} 


=X)Max{Xi^<i Range {c ijk — b #}}=*» as defined in (13) 

l k itl j 

for any y ^ satisfying (4) , we have 

(A6) L(y; b) <#(£, y) <L(y; b)+e b for all (y, z) satisfying (4) and 

(6) and £ satisfying (2.2A) and 


The desired conclusion (A5) now follows easily from (A3)* and (A6). This completes the proof 
of Lemma 2. 

Finally we come to the second conclusion of the Main Theorem. Let (x~, y, z, £) be any 
optimal solution to P It b and generate x t ik for iel according to 

xtj k =— %jk, all ijk with iel. 


This "any feasible disaggregation of £" construction is possible because of the assumption that the 
same jk links exist for all iel. We must show that (x~, x + , y, z) is feasible and e 6 -optimal in P T . 
The verification of feasibility is straightforward. To verify e 6 -optimality we need to show 

S c ijk x; jk +^c ijk xtj k +F(y, z)<v(Pr) + e b . 

ijk ijk 

f/7 UI 

This is an obvious consequence of (12) and 

v(Pi.b)<Ilc m x m +F(y, z)<v{Pr, b ) +e b . 


This last result, in turn, is a simple consequence of these two facts: 

S e«*aSi+S h*1*+F(y, ~z)+Uy; b)=v(P T , b ), 

ijk jk 


vhich holds by the definition of (x~, y, z, £), and 

v r hich can be simplified to 

L(y; 6)<S (cm—b jk )x^<L(y; b)+e b , 

L(y, &)<S c ti &t»— S b jk t Jk <L(y; b)+e b . 

ijk jk 


This completes the proof of the Main Theorem. 

REMARK: It is a straightforward matter to generalize the Main Theorem to cover the 
<ase where several disjoint subsets of items are to be aggregated, say P, . . ., I". The analogs of 

/andP/,6 should be obvious. Assume for h=l, . . ., H that the same jk links exist for every 

em in subset P and choose b% arbitrarily for these links. Define 

L h (y; WASfZDuMin [c ijk -b) k )\y kl . 

kl \i t I> j ) 


v (analog of P 7 . b ) <v (analog of P T )<v (analog of P L b ) +ef , 

«^2 Maxlfj J^D tl Range {c tik -b%})> 

l k [h=l ii/» ; 


and an e^-optimal solution of the analog of P r can be constructed in the obvious way. Note that 
e b H is smaller than the tolerance that would be obtained from H successive applications of the 
original version of the Main Theorem. 


PROPOSITION: e b —0 in expression (13) if and only if there exist numbers y ik such that 

c ijk =bj k -\-y ik for all ijk with iel and k such that it is 
connected to some I with Z?^>0. 

PROOF: It is easy to see that e ft =0 if and only if 

D u Range {c ijk — 6#}=0 for all possible ikl with iel 

(for ikl to be possible, k must be 
connected to I and ijk must exist 
for some j) 

which, by the nonnegativity of D u and of the range function, holds if and only if 

(A7) Range \c ijk — b jk }=0 for all possible ik with iel and k 

such that it is 
I with D u yo. 

such that it is connected to some 

Now the range function has the property that it vanishes if and only if all of its arguments are 
identical, and so (A7) holds if and only if numbers y ik exist such that 

c ijk —b jk =y ik for all ijk with iel and k such that it is 
connected to some I with P u >0. 


Egon Balas and Haakon Samuelssonf 

Carnegie-Mellon University 
Pittsburgh, Pennsylvania 


This paper describes a node covering algorithm, i.e., a procedure for finding a 
smallest set of nodes covering all edges of an arbitrary graph. The algorithm is 
based on the concept of a dual node-clique set, which allows us to identify partial 
covers associated with integer dual feasible solutions to the linear programming 
equivalent of the node covering problem. An initial partial cover with the above 
property is first found by a labeling procedure. Another labeling procedure then 
successively modifies the dual node-clique set, so that more and more edges are 
covered, i.e., the (primal) infeasibility of the solution is gradually reduced, while 
integrality and dual feasibility are preserved. When this cannot be continued, the 
problem is partitioned and the procedure applied to the resulting subproblcms. 
While the steps of the algorithm correspond to sequences of dual simplex pivots, 
these are carried out implicitly, by labeling. The procedure is illustrated by exam- 
ples, and some early computational experience is reported. We conclude with a 
discussion of potential improvements and extensions. 


The problem for which this paper presents an algorithm can be stated as follows. Given an 
undirected graph G= (V, E), where Vis the set of nodes and E the set of edges of 67, i.e., EczVxV, 
find a subset V*dV of minimum cardinality, such that all members of E are incident with at least 
one member of V*. This problem, called in the literature the node covering problem, can be stated 
in integer programming format as 

Z NC =Min e p y 

(NC) s.t. A T y>e, 

ye{0,l} p 

where e p (e,) is a p-vector (q- vector) of ones, A is the node-edge incidence matrix of the graph 
G=(V,E),p—\V\, q=\E\, and T denotes transpose. 

*This research was supported by the Office of Naval Research and the National Science Foundation. An 
earlier version of the paper was circulated under [1]. 

fResearch for this article was performed prior to the death of Professor Samuelson in May, 1975. 



By the transformation y'—e v —y we obtain the problem: 

Z NP =Max e p y' 

(NP) 8.t.AY<«, 

y'*{o,i} p 

which is the node packing problem, i.e., the problem of finding a maximum-cardinality independent 
set of nodes in the same graph G. Thus, if V* is a minimum node cover, then V— V* is a maximum 
independent set of nodes, and Z NC +Z NP =p (see [8]). 

If G is the complement of G, i.e., G' = (V, E'), where (i,j)eE' if and only if (i,j)^E, then (NP) 
amounts to the problem of finding a maximum-cardinality clique in G' . (A clique is defined to be a 
complete subgraph.) 

The node covering and node packing problems have several important practical applications 
(see for instance [5, 13, 14, 2]). The most famous one is formulated in terms of the node packing 
problem and concerns Ar-related subsets of a set. 

Let(S'={l, . . . ,N} be the set. We then search for a largest number of subsets S t (Z.S of a fixed 
size \Si\=n, -Vi, such that no two sets S t , S, have more than k elements in common. This problem 
is solved by associating nodes of a graph with all subsets of size n, and edges of the graph with all 
pairs of nodes corresponding to subsets with more than k common elements. If S is a set of treat- 
ments in a statistical experiment and St are blocks, we have a classical problem in experimental 

Another interpretation is that S is a set of prospective committee-members and the subsets 
a number of committees where for some reason no group of k members should serve on more than 
one committee. 

A common problem in information retrieval can be modeled in terms of a maximum clique 
problem. If pieces of information are represented by nodes of a graph and relations between the 
former as edges of the same graph, the problem of finding a maximum totally related set of data 
elements clearly amounts to finding a maximum clique in the graph, or equivalently a maximum 
node packing in its complement. 

Shannon [13] gives the following application to information theory. A set of signals contains 
a number of pairs that can be confused by a receiver. The problem of finding a largest set of signals 
to use so as to exclude the possibility of confusion is anode packing problem in a graph, with nodes 
representing signals, and with edges between those nodes corresponding to signals that can be 

A number of classical combinatorial problems such as Gauss' chess problem (place eight 
queens on a chess-board out of reach of one another) can also be modeled as node covering problems. 

Our theoretical interest in the node covering problem, however, derives primarily from the 
fact that it is one of the few integer programs for which anything at all is known about the structure 
of the convex hull of integer points feasible to the linear program 

Z LNC =Min e„y 
(LNC) s.t. A T y>e t 


associated with (NC) ; and one is of course tempted to try to put this knowledge to use in some 
solution procedure more efficient than those not using it. 


For a close relative of our problem, namely the edge matching problem, which can be stated as 
(EM) s.t. Ax<e p 


the convex hull of integer points feasible to the associated linear program (LEM) , in which x t { 0, 1 } " 
is replaced by x>0, has been fully characterized by Edmonds [7], who has also given a polynomially 
bounded algorithm for solving this problem. 

The obvious connection between the edge matching and the node covering problem consists 
in the fact that the linear programs associated with the two problems, (LEM) and (LNC), are 
dual to each other, and thus an optimal solution (and, for that matter, any feasible solution) to 
(LEM) gives a valid lower bound on Z LNC , and hence on Z NC . We will make ample use of this 
connection and, more importantly, of a less obvious one which will be discussed in the next section; 
namely, that with any feasible matching one can associate a partial cover, i.e., a solution satisfying 
a subset of the constraints of (NC), which possesses certain desirable properties. 

Unlike for the edge matching problem, for the node covering problem the convex hull H r of 
the integer points feasible to (LNC) has so far been only partially characterized. Several classes 
of facets have been identified by Padberg [12], Chvatal [6], Nemhauser and Trotter [11], Trotter 
[15], Balas and Zemel [3, 4]. (For a recent survey of this whole area, see Balas and Padberg [2]). 
The simplest type of nontrivial facets, and the only ones that we will make use of, are the facets 
associated with maximal cliques. To be more specific, let QaV be any subset of pairwise adjacent 
nodes of G; then the inequality 


where y(j) is the variable of (NC) associated with node j, defines a face of H T , and if Q is of maxi- 
mum cardinality, then it defines a facet, i.e., a (d-1) -dimensional face of H r , where d=dim H t . 

Other classes of facets could be used in a way similar to the procedure discussed in this paper. 
The algorithm to be described below, however, only uses the facets associated with cliques. 

A couple of other results, not used by our algorithm, but related to our problem, are as follows. 
A graph is chordal if it has no cycles of size greater than three without chords; and it is a circle 
graph if its vertices can be identified with chords in a circle and its edges with pairs of chords that 
cross each other. For both cases, polynomially bounded algorithms for the node covering problem 
have been found by Gavril [9, 10]. 

In the next section we introduce the concept of a dual node-clique set that allows us to identify 
partial covers in the graph that can be associated with dual feasible integer solutions to the linear 
programing equivalent of (NC). We then give a labeling procedure which successively modifies 
these partial covers so as to cover more and more edges of the graph, thus representing less and 
less infeasible solutions to (NC), without giving up the dual feasibility property in relation to the 
above mentioned linear program. Since the procedure in general stops short of finding a primal 
feasible (and thereby optimal) solution, some enumeration is necessary and we describe how this 
can be carried out within the same framework. Namely, when a problem is partitioned, the result- 
ing dual node-clique sets need to be only locally modified to serve the same purpose in the graphs 
corresponding to the sub-problems created by the partition. The algorithmic applications of these 


ideas are described in detail, and finally some initial computational experience and some ideas on 
extensions of our algorithm are outlined. 


A set V of nodes in a graph 67= (V, E) is called a cover if every edge of G is incident with some 
node in V. A cover is minimum if there exists no cover of smaller cardinality. 

We will say that VaV is a minimum partial cover if V is a minimum cover in the subgraph G' 
obtained from G by removing all edges not incident with any node in V. 

A set QEzV of pairwise adjacent nodes will be called a clique. 

As any other linear integer program, (NC) can be restated as a linear program, whose con- 
straints are (or include) the facets of the convex hull H T of integer solutions to (LNC). Some of 
these facets, as mentioned, are associated with the maximal cliques of 67; and since the inequalities 
of (LNC) are themselves associated with edges, which can be viewed as 2-cliques, (NC) can be 
restated as the linear program 

Min XI y(J) 


(LPNC) s.t. Sy(j)>|<2<|-1, *Qi*K 



Here y(j) denotes the variable associated with node j, while K is the family of all cliques of 
G; hence the set of inequalities indexed by K includes all the facets of Hi associated with maximal 
cliques, whereas the inequalities indexed by F are all those (not explicitly given) facets of Hi not 
associated with cliques. 

The dual of (LPNC) is the linear program 

Max S (\Qi\-^HQi)+^d ho u h 

QitK htF 

(LNDC) B -*- Q g Qj v(Qi)+ l£ F d »^ u ^ 1 ' 3* V 

Note that this problem is a relaxation of (EM), in the sense that a feasible solution to (LDNC) 
can be associated with every feasible solution to (EM). Indeed, the variables of (EM) 
are associated with the edges, hence the 2-cliques, of 67; therefore, they are among the variables of 
(LDNC) associated with the cliques of 67. Further, there ia a 1 — 1 correspondence between the 
inequalities of (EM) and those of (LDNC), such that each variable of (EM) has the same coefficient 
in thej-th inequality of (EM) as in the j-th. inequality of (LDNC). 

For any node set 7c7, the vector yeR p defined by 



{0 jeV-V 


will be called the solution denned by V, and denoted y(V). A solution y(V) will be termed dual 
feasible if [y(V), s], where s is the vector of slack variables whose value is uniquely determined by 
y(V), is a basic dual feasible solution to (LPNC) restated in equality form. 

The following theorem gives a sufficient condition for a node set V to be a minimum partial 

THEOREM 1 : VcV is a minimum partial cover, if there exists a set K of cliques Q t c:V, 
1=1, . . ., t, such that 

(i) Qfi&Qfi&i^ttQtnQ^V 
(ii) jeVz^jeQi for some Q f eK 
(iii) \Qi\V\ = l for all Q t eK. 

PROOF: We will show that y(V) is dual feasible, hence optimal for the problem defined by 
those constraints that it satisfies. The vector (v, u), defined by u h =0, -VheF, and 


1 Q*K 

satisfies the constraints of (LDNC), since the cliques Q t eK are disjoint and thus the left hand side 
of each inequality is at most 1. 
Further, from (ii) we have 

and from (iii) 

S y(j)>\Q<\-i=lQitK 


i.e., complementary slackness holds for the pair of solutions y(V) and (v, u). Hence y(V) is dual 
feasible. Q.E.D. 

The algorithm to be discussed below generates a sequence of minimum partial covers V, and 
associated sets K of cliques, satisfying requirements (i), (ii), (iii) of Theorem 1. Since each pari 
(V, K) defines, as shown above, solutions to a dual pair of linear programs, we will call (V, K) a 
dual node-clique set. The cliques in K will be distinguished from each other and from those in K—K 
by labeling. Thus a dual node-clique set (V, K) consists of a minimum partial cover V and a collec- 
tion K of labeled cliques. These dual node-clique sets play a central role in the procedure to follow. 
Our algorithm is an enumerative procedure consisting of the following major steps, to be 
described in detail in the following sections. 

Finding an Initial Solution 

' - - 

A starting dual node-clique set (V, K) is generated by a one-pass labeling procedure. 


Reducing Infeasibility 

Given a dual node-clique set (V, K) for the current subproblem, primal infeasibility of the solu- 
tion y(V) is gradually reduced, while integrality and dual feasibility are preserved. This is accom- 
plished by a labeling procedure which successively updates the dual node-clique set (V, K) , and 
whose steps correspond to sequences of dual simplex pivots in (LPNC). The number of steps in this 
procedure is bounded by p(g+l). 

If the solution becomes feasible then it is optimal for the current subproblem. Another sub- 
problem is selected and step 1 is repeated. 

If the infeasibility-reducing procedure ends before the current subproblem becomes feasible, 
then the latter is partitioned. 

Branching and Reestablishing Dual Feasibility 

The current subproblem is split into two new subproblems, one in which a certain node j is 
forced into the cover, and another one in which node j is forced out of the cover, while all nodes 
adjacent to j are forced into the cover. The nodes that are forced (into or out of the cover) are 
removed from the graph along with the edges incident with them, so that each new subproblem is 
associated with a proper subgraph of the graph of the parent problem. 

Dual node-clique sets (V, K) are determined for each new subproblem from the dual node 
clique set of the parent problem. If any of the two new subgraphs is bipartite, the corresponding 
subproblem is solved as an assignment problem. 

Bounding and Subproblem Selection 

For each new subproblem created by branching, a lower bound on the objective function 
value is available from the dual feasible solution y(V). Another bound, often sharper, is generated 
by solving (LNC), which is accomplished by solving an associated assignment problem. The 
subproblem with the smallest lower bound is then selected for processing. 

With a proper selection rule, the depth of the search-tree is bounded by the cardinality of 
a minimum node cover. 


Notice that (V M , K M ), where K M \s the set of 2-cliques (pairs of nodes) specified by a matching 
Min G, and VJ contains one of the two nodes of each such pair, satisfies the requirements (i) , (ii) , (iii) 
of Theorem 1 and hence defines a dual feasible solution to (LPNC) . 

Thus, the starting point for our procedure is an edge matching M in 67 which gives rise to 
a minimum partial cover V M of the same cardinality. 

Of course, the higher the cardinality of M and thereby the value of the initial dual feasible 
solution y(V M ), the better it is. A natural choice for M would therefore be a maximum matching 
i.e., an optimal solution to (EM). The fact that there is a potynomially bounded algorithm for 
that problem also seems to support this view. 


We found however, that very good starting solutions could be obtained by means of a simpli- 
fied procedure that finds a reasonably good matching in just one pass through the graph. The 
procedure, which is similar in spirit to Edmonds' algorithm, is the following: 

(a) Starting from any node in the graph, attempt to partition V into two sets by alternatingly 
putting adjacent nodes into different sets. 

(b) If (a) is interrupted by an inconsistency, an odd cycle C t has been located and can be 
identified by tracing back along the paths by which the latest node was reached. The 
graph is then reduced by shrinking C u i.e., replacing it by a single node adjacent to all 
nodes jiC t adjacent to C t . 

(c) When the node set of the reduced graph has thus been partitioned, a maximum matching 
is found in the reduced (bipartite) graph. 

(d) Odd cycles are successively unshrunk and k t edges of each cycle C t are added to the 

matching, where i.-l^bl". 

PROPOSITION: The above procedure is consistent and finds a matching in G. 

PROOF: The steps (a)-(c) can always be carried out since a graph is bipartite if and only if 
it has no odd cycles. To prove the validity of (d) we use an inductive argument. Suppose that at a 
given stage of the unshrinking step we have a matching M' in the current graph G' and we want 
to unshrink node rid that represents an odd cycle C t . Since M' is a matching in G', at most one edge 
of M' is incident with n Ci - Let (j, na) be this edge. Then the graph G" obtained from G' by un- 
shrinking nci, has an edge (j, h) for some htC t . 

Pick any such edge to be put, along with the edges of M' , in the new matching M" to be 
denned in G" . This will leave exactly 2k t nodes of C t exposed relative to M', which allows us to 
place ki matching edges of C t into M" . Since we can always find a matching in a bipartite graj h 
and the above procedure can be reapplied whenever a node is unshrunk, the proof is complete. 

To illustrate the portion of the algorithm that has been described so far consider the graph 
shown in Figure 1. The crossed edges constitute the feasible matching M found by our procedure, 
which in this case also happens to be maximum, as is easily verified. The pairs of nodes defined by 

M constitute the initial set K of labeled cliques. The crossed nodes are in the corresponding minimum 

partial cover V M . 

We see that two edges (the circled ones) are not covered and the associated solution y (V), while 
dual feasible, is therefore not primal feasible. We now turn to the method for improving a minimal 

partial cover by introducing into K cliques of cardinality greater than 2. 


The following procedure is uesd to reduce the (primal) infeasibility of a given dual feasible 
solution, while preserving integrality and dual feasibility. Each of the steps 2(a), 2(b), or 2(c) 
below corresponds to one or several dual simplex pivots. 

1. Scan E for edges that are not covered by V, i.e., for which the corresponding constraint in 
(LPNC) is not satisfied. 




Figure 1 

2. For each edge (i, j) that is not covered, attempt to perform one of the following steps in turn : 

(a) (First labeling step) 

If neither i nor j belongs to a labeled clique, label the 2-clique {i, j}. Then put into 
V either i or j, whichever covers more edges not yet covered. 

(b) (Reassignment step) 

If either i or j can replace in V one of the members of the labeled clique to which it 
belongs without creating any infeasibilities, make the switch to cover (i, j) 

(c) (Second labeling step) 

Find a largest unlabeled clique <2* , if it exists, such that 
(i) i,j*Q* and |Q„| >3 

(ii) Q» Q*=0 for all &d£such that &<£0* and \Q h \ >3 
(hi) Q* contains a labeled clique or a node not belonging to a labeled clique 
Then proceed as follows: 
(a) Label Q* 

(0) Put into V all but one of the nodes in Q* 
(y) Delete from V all nodes jeV~Q„< belonging to labeled 2-cliques incident 

(5) Delete from K ("unlabel") all labeled cliques contained in Q + and all 

labeled 2-cliques incident with Q+ 



To continue the above example, we see that one of the uncovered edges is in a 4-clique Q a . 

If Q a is labeled we can bring the node n a into V and get the situation shown in Figure 2. 

We are left with one uncovered edge and to eliminate it we label the triangle Qp and put Up 

into V (see Figure 3) . This however forces n y out of the solution and gives rise to two new uncovered 

Labeling anj r of the triangles on which n y lies makes it possible to reintroduce n y into the 

cover. We then get the dual node-clique set (V, K) shown in Figure 4, where K contains the circled 
cliques, plus the 2-cliques corresponding to the crossed edges. This solution happens to be (primal) 
feasible and is thus also optimal, which rneans that the crossed nodes in Figure 4 constitute a mini- 
mum node cover. 

Next we prove the following property of the above procedure. Let 

E(y) = {(i,j)eE\y(i)+y(j)>l}. 

THEOREM 2: The infeasibility-reducing procedure produces a sequence of dual node-clique 

sets (V u Ki), and associated dual feasible integer solutions y(V % )=y i , i=l, . . ., r, such that 
(a) e p y i >e p ?/ i - 1 i=2, . . ., r 

((8) y* satisfies all constraints corresponding to cliques QcK*' 1 , |Q|>3, satisfied by y*' 1 
(y) If y' is obtained by step (a) or (b), then | J E(y < )|>|£ , (y i_1 )|; and in the case of step (a) 

(5) If y i is obtained by step (c), then y* satisfies a constraint corresponding to a clique QtK u 
\Q\ >3, which is violated by y i=1 


Figure 2 



Figure 3 

PROOF: We first prove that the conditions of Theorem 1 are preserved by the procedure. 

(i) Step 2(a) labels a 2-clique whose nodes do not belong to any clique; step 2(b) does not label 

any clique; and step 2(c) labels a clique Q* only if Q* is disjoint from all labeled cliques Q such 

that \Q\ > 3, while it "unlabels" (removes from K) all labeled cliques which have a nonempty 
intersection with Q# . Thus each of the three steps leaves all the labeled cliques pairwise disjoint, 
which is condition (i) of Theorem 1 . 

(ii) If a node is put into V under any of the steps 2(a), 2(b), or 2(c), it belongs to some labeled 
clique. Conversely, if a labeled clique Q is "unlabeled," which can only happen in step 2(c), then 
Q is either contained in another labeled clique Q*, or is a 2-clique such that QC\Q*^&- In the 
latter case, if the node Q~Q* belongs to V, it is taken out of V. Hence, the set V resulting from any 
of the three steps, is contained in U Qu which is condition (ii) of Theorem 1. 


(iii) Step 2(a) labels a 2-clique Q and puts into V one of the two nodes of Q, thus making sure 
that |QnV] = l. Step 2(b) replaces a node jtQ of V by another node heQ, which leaves |Qn^| un- 
changed. Step 2(c) labels a new clique Q* and puts into V all but one of the nodes of Q*. Hence 

each step preserves property (iii) of the node-clique set (V, K) . 

Thus, from Theorem 1, the above procedure generates a sequence of dual node-clique sets 

(r «, Ki), and associated dual feasible integer solutions y\ To show that the sequence has the other 
properties claimed in Theorem 2, we examine each of the steps 2(a), 2(b), 2(c) in turn. 



>». ^ 

Figure 4 

Step 2(a). Under this step F< is replaced by V i+ i—Vi\j {j} for some jeV— V t ; thus (a), (/3) and 
(7) clearly hold. 

Step 2(b). Under this step V x is replaced by V i+ i= (V t — {j}) U {h} for some pair {j, h} belonging 
to the same labeled clique Q; hence (a) holds after the step. Also, since j and h do not belong to any 
other labeled clique but Q, (0) holds. Finally, the pair {j, h} is chosen so as to increase by one the 
number of edges covered by V it i.e., (?) also holds. 

Step 2(c). Under this step, V t is replaced by V t+i = (V t — S) U S', where S and S' are the subsets 
of nodes deleted from and introduced into Thunder steps 2(c) (a) and 2(c)(/3) respectively. 

We claim that |S"| > \S\. From 2(c) (a), each node of S belongs to a two-clique which intersects 
<2* without being contained in it. For each such two-clique {%, j}, where ieS, wc have jtQ* ~V t . 
Let S" be the set of these nodes, i.e., 

S'={jeQ*~V t \{i,j}Jt t ,uS}. 

From what we just said, IS'l^SI, i.e., Q* contains at least \S\ nodes belonging to two-cliques of 
the above type and not belonging to V t . By property (iii), Q* also contains at least one additional 
node not belonging to V u which raises the number of nodes in Q*~V t to at least ISI + l. Since by 
step 2(c)(0) all but one of the nodes in Q*~V t are introduced into V i+ i, it follows that the number 
\S'\ of such nodes is at least equal to \S\. This proves the claim. 



From \S'\ >\S\, it follows that property (a) of Theorem 2 holds after the step 2(c). Properties 
(|8) and (8) also hold, since no vertex belonging to a clique of cardinality Ar>3 is deleted from 
V u and V i+i contains all but one of the vertices of the newly labeled clique Q* , where | Q* | > 3 and 
|^\n Q#\ <\Q*\— 2, i.e., the constraint of (LPNC) corresponding to Q* is satisfied by y t+1 , but 
not by y\ The last inequality follows from the fact that if (?', j) is the edge found under step 1, 

then {i,j}*Q*~V t . Q.E.D. 

COROLLARY 2.1: After at most p{q-\-\) steps, of which at most p are of type 2(c), the 
infeasibility-reducing procedure either finds an optimal solution or camiot be continued. 

PROOF: Each time step 2(c) is applied, the number of nodes belonging to labeled cliques of 
cardinality k>3 is increased by at least one, while the other steps leave this number unchanged. 
Hence |V|=p is an upper bound on the total number of steps 2(c). 

Each of steps 2(a) and 2(b) increases by one the number of edges covered; hence \E\=q is an 
upper bound on the number of steps 2(a) and 2(b) between any two consecutive steps 2(c). Since 
the total number of steps 2(c) is bounded by p, the total number of steps 2(a) and 2(b) is bounded 
by pXq. 

Summing up the above, pXq-\-p=p(q-\-l) is an upper bound on the total number of steps in 
the procedure. Q.E.D. 


Although in the above example the infeasibility-reducing procedure found the optimal solution 
directly, this cannot be expected in general, as illustrated by the example shown in Figure 4a. 
Assuming that the 4-clique is labeled, the procedure will stop short of removing the infeasibilities 
at the edges e u e 2 . 

In such a case we branch, i.e., partition the current subproblem. 

Our partitioning rule is the usual one, i.e., we create two subproblems by setting a variable in 
(LPNC) to and 1 respectively. However, due to the special structure of the problem [i.e., the 
presence of constraints of the form y(i)-\-y(j)>l, (i, j)eE], i/(j)=0 implies y(i) = l for alH adjacent 
to j, and thus the partitioning rule becomes 

{yti)=i}V{y(j)=o, y(i)=h ■«= (*,*)«#}■ 

Figure 4a 


For both subproblems created by the partition the graph can be reduced accordingly. 

Since the current solution may not be dual feasible for the subproblems, it is necessary to re- 
establish this property. It is one of the main advantages of our approach that the all integer dual 
feasible solution associated with the original graph can be made valid for the subproblems, by only 
local modifications. 

The partitioning procedure can then be stated as follows. 

LetA(j) = {UV\(i,j)eE}. 

0. Choose ieV— Vsuch that 

\A(i)\V\= max \A(j)\V\ 


1. Partition the current problem by 

{0(i?=l}V{y(i)=O, y(j)=l, ¥jeA(i)}. 

This creates two subproblems and associated subgraphs, as follows. 

SUBPROBLEM 1 is (LPNC) defined on the graph Gi, obtained from G by removing node i 
and all edges (i, j), jeV. 

SUBPROBLEM 2 is (LPNC) defined on the graph G 2 , obtained from G by removing nodes i 
and jtA(i), and all edges (i, j) and (j, h), jeA(i), heV. 

2. Redefine (V, K) as follows: 

Let (V° u , K old ) and (V new , K Dew ) denote the dual node-clique sets of the parent-problem and 

(any) one of the subproblems, respectively. Let V Dew be the node set of the new graph. 


y(i) is fixed at 1, 1 

ieQ for some QeK oia \ 

For each iel, choose a node j(i) eQf|^ new having as few adjacent nodes in V Dew ~V° M as possible, 

and let J={j(i)\id}. Then define 

ynew_ynew pj f/old_ J 

Further, let 

S={QeK 0lA \Q£V ae "} 

S' = {QC\V De «\QeS, \QnV Dew \>2}. 
Then define 

K De »={K ol<i -S}\jS' 

To illustrate the above on a graph that requires some enumeration, consider the example 
shown in Figure 5. The maximum matching and the corresponding partial cover are crossed. There 
is an infeasibility in the 4-clique, but that is immediately eliminated by labeling it. The remaining 
infeasibility cannot be removed though (Figure 6). (Note that the infeasible edge e A in figure 6 is 
adjacent to an odd cycle that is not a clique.) 

The procedure specifies n a as the branching node. Subproblem 2 is given by 

y(n a )=0 and y{i) = \ for t=%, n y , n t . 

This leaves the graph shown in Figure 7. We have feasibility and thus an optimal cover for 
this subgraph, which, together with the nodes forced into the cover by the branching step, gives a 
cover of cardinality 9 for the initial graph G. 

1= U e y-V M 



Subproblem 1 is defined by y(n a ) = l, and shown in Figure 8. Note that rii had to be taken out 
of V to preserve complementarity, since it belonged to the same labeled 2-clique as n a . 

Now n s is in a clique, but one which cannot be labeled, since it already contains a node from a 
labeled clique. So we branch on n 8 . For Subproblem 2, denned by y(w { ) = l, we get a solution that 
is immediately found to be feasible. Subproblem 2, denned by y(m)—Q and y{i) = \ for all i adjacent 
to rii, is shown in Figure 9. 

The labeled 4-clique is now reduced to a triangle. The infeasibility from forcing n» out of \? is 
immediately corrected by labeling the triangle containing n» and we have an optimal solution to 
the subproblem. 

Thus we terminate with the search tree shown in Figure 10. 

We have found and verified three alternative optimal node-covers of cardinality nine. 

We now prove the following property of the procedure. 

THEOREM 3 : (F new , if new ) is a dual node-clique set if (V oia , i£ old ) is. 
PROOF: We show that conditions (i), (ii), (hi) of Theorem 1 are satisfied. 

(i) This property is clearly preserved, since all new members of K are obtained from old 
members by deletion of some nodes. 

(ii) From the definition of K Dew , the only nodes jeV Dew D V oia contained in some clique of K oli 

but no clique of K new are those belonging to some 2-clique {%, j] eX old , such that ieV—V new . But since 

jeV oia , from property (iii) of (V° m , K old ) it follows that ieV— V oli . Further, the variable y(i) must 
have been fixed at 1, for if it had been fixed at 0, then y(j) would have been fixed at 1 and ,;' would 
not be a node of the new graph. Hence iel and therefore jeJ, where / and J are the sets used in the 

definition of V Dev . Thus jeV new —V aew , which proves that property (ii) holds for (y new , K ntw ). 

(iii) For any clique QtK new D K°™, from the definition of F new we have Q[\V™=Q{\ f oii , 
hence property (iii) is preserved. 

Figure 5 



For each clique QtS'=K Dew /K old , there exists some Q'eK 01 * such that QczQ'. Then (Q'-Q) 

n (V—V° li ) is either empty, or consists of one element, say i. In the latter case, iel and an element 

j(i) of Q is removed from V oli , according to the rule defining V new ; while in the former case, Q |~l F new = 

Q'C\V oia r\V aew . In both cases, \Q' nV oUi \ = \Q'\-\ implies |Qnt? new | = |#|-l. Q.E.D. 

Figure 6 


Figure 7 




Before any further work is done on a newly created subproblem, we check whether the as- 
sociated graph is bipartite. If so, then the subproblem is solved as an assignment problem. Other- 
wise we turn to the task of calculating bounds. 

Figure 8 

Figure 9 



Figure 10 

For all the subproblems created in the enumeration process, a lower bound on the optimal 
objective value is available from the dual feasible all-integer solution. 

Another such bound is given by the solution to (LNC), and our experience indicates that the 
latter is often, though not always, sharper. To facilitat"* fathoming by comparison with the cur- 
rently best feasible solution, we found it worthwhile to solve (LNC) for each subproblem. This is 
rather inexpensive, thanks to 1 he following result attributed by Trotter [10] to Edmonds and 
Pulleyblank, which makes (LCN) equivalent to an assignment problem in a bipartite graph twice 
the size of 67. 

THEOREM 4: (Edmonds and Pulleyblank) Define a bipartite graph G'=(V\JV, E') where 
V is a copy of V, i.e., i'eV if UV, and E'={(i, j')\UV, j'tV, (i, j)eE}. Let (x, x'), where x and x' 
are 0-1 vectors having one component for each nod." of V and V respectively, be an optimal solu- 
tion to (NC) denned on G'. Then 


is an optimal solution to (LNC) defined on 67. 

PROOF: Obvious (see [10]). 

In case the objective value found for (LNC) is fractional, the bound can of course be rounded 
upwards. In principle one could apply to fractional solutions the usual penalties used in branch and 


bound. It turns out, however, that due to the high degeneracy of assignment problems, nothing 
is usually gained that way. 

Each time two new subproblems are created, attempts are made to fathom both. If this is 
not possible, one of them is stored and the other is partitioned again. In this choice, preference 
is given to the subproblem where a variable is fixed at 0. 

When both of the current subproblems are fathomed, a problem has to be selected from stor- 
age. In this choice the algorithm is guided by the tightest of the two lower boards just mentioned, 
and the problem with the lowest such bound is considered next. 

To conclude the description of our algorithm we give a flow chart of the entire procedure 
(Figure 11). 

In the above version of the algorithm, after each branching step we start processing one of 
the two newly created subproblems. We found this preferable to choosing the subproblem with 
the lowest bound after each branching step, since it keeps storage requirements low. 

If, however, the other rule is preferred, then one can state the following. 

THEOREM 5: If the subproblem with the lowest bound is selected for processing after each 
branching step, then the cardinality of a minimum node cover is an upper bound on the depth of 
the search tree. 

PROOF: At each branching step, at least one new variable is fixed at 1 in each of the two 
subproblems created (if y{i) is chosen for branching, A(i) 9^<j>). Q.E.D. 


The version of our algorithm illustrated in Figure 11 was programmed in Fortran IV for an 
1108 Univac computer. Since the programming was undertaken mainly to test the validity of our 
approach, no particular care was taken to program the algorithm in an efficient manner at this 
stage. Instead, maximum flexibility in testing alternative procedures was emphasized and the 
results were evaluated primarily in terms of the number of iterations in the various parts of the 

As should be evident from the previous description of the algorithm, all calculations can be 
carried out by labeling in a computer representation of the graph. Thus we work essentially with: 

(a) A list of size p, indicating for each node whether it is in V and whether it belongs to a mem- 
ber of K, and if so, which one 

(b) A list of edges 

(c) An adjacency matrix 

(d) A problem list. For each subproblem on the list it is only necessary to save (a), since (b) 
and (c) are basically the same for all of them. This enables one to keep fairly large problems entirely 
in core, although the very limited memory size of the 1108 (30 k words available and no possibility 
to use halfwords) has only allowed us to attempt problems with up to 50 nodes and 400 edges- 




Find a matching and 

associated partial 



solution with 

best lower bound 

Find node itfV to cover max 
#■ of uncovered edges 
Define subproblerns 

(a) y (I > = 0, y Cj ) =1,jeA(i 

(b) y (i)= 1 
Modify solutions 

C ST0P ) 


Go to 
b) loop 

Figure 11 



A number of node covering problems randomly generated by Trotter [14] were kindly made 
available to us by him. The following tableau contains results on those of his problems that do not 
violate the above space requirements. 


Number of 

Number of 

Number of 

applications of 



Number of 

applications of 


reducing routine 














































We see that the number of operations increases with the density of the graph (i.e., the number 
of arcs/number of nodes). In Trotter's experience [14] with these and other problems with 50 
variables but higher density, the hardest problems are those of density approximately 25%, such 
as the three last ones in the above tableau. The use of the LP-bound decreases the number of itera- 
tions by approximately 50% to 75% and is also helpful in cutting down storage requirements. 


As in the general set covering problem, there are a number of reductions that can be used to 
cut down the size of the graph before solving the node covering problem. 

For instance, if the degree of a node exceeds an upper bound on the value of a minimum cover, 
then it must belong to anj r such cover. If a node is in a clique and only adjacent to other nodes in 
the clique, there is an optimal cover that does not contain it. 

Features such as these were not incorporated in our present code which, as mentioned above, 
was mainly intended as a tool for testing the merits of our approach and for allowing us to experi- 
ment with different versions of the major components of the algorithm. 

A more general problem than the one considered in this paper is the weighted node covering 
problem, i.e., 

Min bv 

(NC b ) 

A T y<e, 
y*{o,i} p . 

where 6 is a ^-vector of positive integers. We have recently found an extension of our algorithm to 
this problem, that employs essentially the same concepts and algorithmical steps, with appropriate 
modifications. This extension is left, however, to another paper. 



[1] Balas, E. and Samuelsson, "Finding a Minimum Node Cover in an Arbitrary Graph." Man- 
agement Science Research Report, GSIA No 325, Carnegie-Mellon University (Novem- 
ber 1973). 
[2] Balas, E. and M. W. Padberg, "Set Partitioning : A Survey," SIAM Review, 18, 710-760 (1976). 
[3] Balas, E. and E. Zemel, "Graph Substitution and Set Packing Polytopes," Management 
Science Research Report, GSIA No. 384, Carnegie-Mellon University (January 1976), 
to appear, Networks. 
[4] Balas, E. and E. Zemel, "Critical Cutsets of Graphs and Canonical Facets of Set Packing 
Polytopes," Management Science Research Report No. 385, Carnegie-Mellon Universit} r 
(February 1976) ; to appear, Mathematics of Operations Research. 
[5] Berge, C, Graphs and Hypergraphs (North-Holland, 1973). 

[6] Chvatal, V., "On Certain Polytopes Associated with Graphs," Centre de Recherches Mathe- 
matiques, CRM 238, University de Monti eal (October 1972), Revised as CRM 397, 
(March 1974). 
[7] Edmonds, J., "Maximum Matching and a Polyhedron with 0-1 Vertices," Journal of Research 

of the National Bureau of Standards 69B, 126-130 (1965). 
[8] Gallai, T., "Uber extreme Punkt und Kantenmengen," Annales Universitatis Scientorium 

Budapest, Eotvos, Le Sectio Mathematicn 2, 133-138 (1959). 
[9] Gavril, F., "Algorithms for Minimum Coloring, Maximum Clique, Minimum Covering by 
Cliques, and Maximum Independent Set of a Chordal Graph," SIAM Journal of Computa- 
tion 1, 180-187 (1972). 
[10] Gavril, F., "Algorithms for a Maximum Clique and a Maximum Independent Set of the 

Circle Graph," Networks, 3, 261-273 (1973). 
[11] Nemhauser, G. and L. Trotter, "Vertex Packings: Structural Properties and Algorithms," 

Mathematical Programming, 8, 232-248 (1975). 
[12] Padberg, M., "On the Facial Structure of Set Packing Polyhedra," Mathematical Programming, 

5, 199-215 (1973). 
[13] Shannon, C, "The Zero-Error Capacity of a Noisy Channel," IEEE, Transactions, 3, 3 (1956). 
[14] Trotter, L., "Solution Characteristics and Algorithms for the Vertex Packing Problem," 
Technical Report, 168, Department of Operations Research, Cornell University (January 
[15] Trotter, L., "A Class of Facet Producing Graphs for Vertex Packing Polyhedra," Technical 
Report No. 78, Yale University (February 1974). 


W. J. Hayne and K. T. Marshall 

Naval Postgraduate School 
Monterey, California 


A two-dimensional state space Markov Model of a Manpower System with 
special structure is analyzed. Examples are given from the military services. The 
probabilistic properties are discussed in detail with emphasis on computation. The 
basic equations of manpower stocks and flows are analyzed. 


The simple fractional flow (or Markov-type) model of personnel movements through an 
organization has been widety analyzed (see for example Bartholomew [1], Blumen, Kogan and 
McCarthy [2], Lane and Andrew [5], Rowland and Sovereign [8] and has been widely applied, 
especially in military manpower planning (see U.S. Navy [9]). Other models such as the "cohort" 
and "chain" models (see Marshall [6] and Grinold and Marshall [3]) satisfy more realistic assump- 
tions on personnel movement, but lack the convenient structure of the Markov model. The purpose 
of this paper is to present an extension of the Markov Model to one with a 2-dimensional state 
space. The state space is chosen so that the fractional flow matrix has a special structure which 
*s then exploited. 

In Section II the structure of the model is presented and in Section III examples are given. 
In Section IV we present the probabilistic properties of the model with emphasis on computationally 
tractable formulae. In Section V the structure of the model is exploited in the personnel stock 
and flow equations. 


We assume that for planning purposes an organization considers time in discrete periods, 
and that people are counted at the end of each period. When counted, a person is assumed to 
possess two characteristics i and j, and is said to be in state (i,j), where i represents the first char- 
acteristic (FC), l<i<n, and j represents the second characteristic (SC), l(i)<j<u(i). Here 
l(i) and u(i) are the lower and upper limits respectively for the SC when i is the FC. Also let 

J(i) = {j\l(i) <j <u(i)} , the set of SC's for FC i, and let w t be the number of elements in J(i). 

* The work reported herein was supported by a grant from the U.S. Marine Corps. 




Let qi(j,m), j,m^J(i) be the fraction of people in state (i,j) in a time period who move to 
state (i, m) in the next time period, and lei Q t be the w 4 XWi matrix [qi(j, m)]. Let pi(j, m),j£J(i), 
m£ J(i+1) be the fraction of people in state (i,j) in a time period who move to state (i+l,m) 
in the next time period, and let P, be the WjXw i+ i matrix [pi(j,rri\. A basic assumption of our 
two-characteristic model is that movement in one time period from states with FC i can only be 
to states with FC i or i+1, or out of the system. Following the notation of earlier papers, let Q 
be the fractional flow matrix for all active states in the system. Then Q has the following structure: 

~Qx Pi 

Q2 * 2 



Qn-l Pn-1 

Qn J 

where the zero matrices have been suppressed. Throughout this paper we assume that people 
can leave the system eventually from any state and thus (I — Q) has an inverse, where / is the 
identity matrix. This inverse (I — Q)" 1 is called N, the fundamental matrix (see Kemeny and 
Snell [4]. For each i\etA i =(I — Q i —P i )l ) where I is a vector with every element equal to 1. Then 
Aj is a vector of w ( elements, each one an attrition fraction from the appropriate state. 


(a) The LOS Model 

Let the "length of service" that a person has completed with an organization be denoted 
LOS, and let the FC of a person denote his LOS. If the LOS is measured in the same time units 
as the planning periods then each Q { matrix is a zero matrix. In each planning period a person's 
LOS must increase by one unit, so that Q has the structure 

1 Pi 


P 2 
\ \ 

\ \ 
\ \ 




By appropriate choice of the second characteristic P< often has special structure too. Consider 
a hierarchical system where the "rank" or "grade" of an individual is used as his second charac- 
teristic. Then the structure of each P v depends on the organization's promotion scheme. Assume 



that in one period a person either stays in the same grade or is promoted one grade. No demotions 
occur. Then P t would have the structure 

P t = 

where x represents a non-zero element. 

x x 
\ \ 
\ \ 
\ \ 
\ X 



(b) The (Grade, LOS) Model 

If, as in (a) no demotions occur, and only single promotions can occur in a time period, then if 
i indexes the grade, and j the length of service, then Q has the structure shown in (1). Each sub- 
matrix has special structure also. Let 

q ti = probability a person in state (i, j) at end of one period will be in state (i, j-f-1) at the 

end of the next period, 
p (; =probability a person in state {%, j) at the end of one period will be in state (i+1, j+1) 

at the end of the next period. 
The transition matrix Q» has non-zero elements only immediately above the main diagonal : 



2>. id) 

2«. J(i)+1 

<Z«. «(0-i 

The transition matrix P< has non-zero elements only on a single diagonal. If l(i+l) > l(i) + 1 a nd 
u(i-\-l)>u(i)-{-l, then P t has the form shown below, where: 

(1) the top max {0, Z(i+1) — (Z(i)-f-l)} rows are zeros, 

(2) the last max {0, u(i+l) — (u(i)+l)} columns are zeros. 



Pi, Hi+l)-l 

Pi. J«+D 



If l(i+l)<l(i), the first l(i) + l — l(i+l) columns of P t are zeros. If u(i+l)<u(i), the bottom 
w(i) + l— u(i+ 1) rows of P t are zeros. Under any circumstances P, has only one non-zero diagonal, 
and we call such a matrix a diagonal matrix. 


(c) The (Grade, TIG) Model 

In certain applications a person's "time in grade," denoted TIG, is more important than his 
time in the system (LOS). If again we allow no demotions and only single promotions per period 
and if the FC indexes the grade and the SC the TIG for the appropriate grade, the Q has the same 
structure (1). Each Q t has the same structure as in (b), but now l(i) = \ for each grade i. However, 
each matrix P, has a single column of non-zeros, 

> TP« ° °~| 

since promotions to the next highest grade lead always to a TIG of 1. Here p u is the fraction those 
in grade i, with time in grade i equal toj, who are promoted to grade t+1. 

If demotions are not allowed and only single promotions can occur per period, then grade is a 
characteristic which can be used as the IC. A larger number of possibilities occur for the SC. In 
addition to those above some useful ones are (i) skill category, (ii) physical location in a multi- 
location organization, and (hi) educational level. Note that educational level cotid also be used as 
the FC. 


Let T t be the set of states associated with FC i ; thus 

- T,={(i,j)\j£J(i)}, i=l,2, . . .,n. 

Also let w l =u{i)—l{i)-\-\, the number of states in T t (and J(i)). Finally let T be the single state 
"out of the system." 

In this section we develop the probabilistic properties of: 

(1) any set of states T u 

(2) any union of consecutively indexed sets T u i.e. 


UT t , 

(3) the union of all transient states, which we call T . 

One of the purposes of this development is to show that the stochastic properties of Q, typically 
a large matrix, are readily calculated in terms of the smaller matrices Q t and P it and as seen in 
Section III these often have extremely simple structure which leads to simple computation. 

The format of this section follows closely that of Chapter 3 of Kemeny and Snell [4]. The 

notation (K&S, 3. . ) indicates that a result follows from Theorem 3. . in Kemeny and 

Snell, albeit usually not directly. 



(a) First-Order Properties 

• Recall that we assume system matrix Q has a fundamental matrix N=(l—Q)~ 1 , and each 
element of N is the expected number of visits to the column state starting from the row state 
(K&S, 3.2.4). Since Q has the structure shown in (1), 






NfiNiPiNz . 

. . V (N<P t )N n 


N 2 

NJ> 2 N 3 



n 1 (N t P t )N n 


N n 

N^il-Qi)- 1 ,^!,. 

the fundamental matrix for FC i. Note that the large matrix N is completely determined by the 
matrices N t and P t . Thus the only matrix inversions required are those of (I—Qi), i=l, . . ., n. 
This i of considerable computational significance because, as previously mentioned, Q is usually 
a larg< matrix. 

E ich matrix N t has a probabilistic interpretation. We pursue this interpretation and show 
that these matrices can be used to determine other probabilistic properties of interest. 

In this section we make numerous definitions and denote the k ih one by D£. 

Lst us consider first the properties associated with a single set of states T t and define: 

Dl. Vi(j, m) = expected number of visits to state (i, m) given that FC i is entered in state (i, j), 
V t =a WtXiVi matrix having Vi(j, m) as the element in row j—l(i)-\-l and column 


From (2) the element of N t in row j— l{i)-\-l and column m—l(i)-\-l equals the expected 
number of visits to state (i, m) given that FC i is entered in state (i, j). (K&S, 3.2.4). So, from 
definition Dl, we have, 


V t =N t . 

Note iat the rows and columns of N t and V t correspond to states in T t in the same order as the 
rows d columns of Q t . 
v define: 

1 . T((,7)=expected time in FC i given that FC i is entered in state (i, j), 

r t =[Ti(l(i)), . . ., Ti(u(i))], aWtXl vector. 


The expected time spent in FC i equals the sum of the expected number of visits to the various 
states in FC i. From (3) and D2, 

Ti(j)— component (j— l(i) + l) of Nj, 

(4) T t =N t l, a w t X 1 vector, 

where 1 is a vector with all components equal to one. 

We next turn our attention to where the process goes when it leaves FC i. The process upon 
leaving T t must enter either T 1+1 (if i<Cn) or T . Next define 

D3. bt(j, m)= probability of entering FC i+\ in state (i+1, m) given that FC i is entered in 
state (i, j) 
B t =a Wi\w i+ i matrix having b t (j, m) as the element in row j— l(i)-\-l, and column 

m— Z(i+1) + 1, 
D4. b t (j) =probabilit3 r of ever entering T i+1 given that FC i is entered in state (i, j), 

b t =[bi(l(i)), . ■ ., bi(u(i))], aWjXl vector, 

D5. b i0 (j) =probability of never entering T i+i given that FC i is entered in state (i, j), 
b i0 =[bio(l(i)), ■ ■ -, b i0 (u(i))], a w t Xl vector. 

From these definitions it follows that 

(5) B^NiPi, a WiXWi matrix (K&S, 3.5.4), 

bi=Bi\, a WiX 1 vector, 

6 i0 =l— & . 

=N t A u aw ( Xl vector. 

The matrix B t is particularly useful in manpower policy analyses. For example let J t be a 
\Xw t vector of the number of peopla entering T t . Then f t B t is a lX%r vector of the number of 
these people who will eventually enter T i+X . (K&S, 3.3.6). Thus B t can be used to reveal the pro- 
motion structure in either the (Grade, LOS) or (Grade, TIG) models. 

Next we consider the first-order properties related to FC's i and k where i<k. Define: 
D6. b((i, j), (k, m))=probability of entering FC k in state (k, m) given that FC i is entered 

in state (i, j) , 
B lk =a WiXWk matrix having b((i,j), (k, m)) as the element in row j—l(i)-\-l 

and column m— l(k)+l. 
From definitions D3 and D6 and a simple conditioning argument we have, 

b((i,j), (t+2, m))- 22 b i (j,r)b i+1 (r,m), 


B t . i+ 2—BiB i+ i. 

Notice from D6 that B u is an identity matrix and from D3 that B it i+ i=B t . More generally it can 
be shown that for i<k, 

B ik = II B r , a WiXw k matrix. 



D7. v((i, j), (k, m))=expected number of visits to state (k, m) given that FC i is entered in 

state (i, j), 
V ik =a v) t XWic matrix having v((i, j), (k, m)) as the element in row j— Z(i)-fl 

and column m — l{k)-\-\, 
D8. baij) =probability of ever entering FC k, given that FC i is entered in state 

(i, j), 
b a =[bik(l(i)), ■ ■ •> b ik (u(i))], a w f Xl vector. 

Considering each row of B ik as the part of an initial probabilit}^ vector that applies to T k , we then 

(6) V ik =B ik N k , a w t Xw k matrix (K&S,3.5.4), 


b ik =B ik \, aw f Xl vector. 

D9. r ik (j)= expected time in FC k given that FC i was entered in state (i, j), 
r ik =[7-i*(Z(i')), . . ., T ik (u(i))], a w { Xl vector. 

The expected time in an FC is the sum of the expected number of visits to states in that FC, so 

Ta— V ik l, a w t X 1 vector. 

This completes our study of the first-order properties related to the various FC's of the system. 
The foregoing definitions by no means exhaust the first-order properties of the two-characteristic 
model that might conceivably be of interest. It is felt, however, that these properties will often be 
of practical interest and that other first-order properties may be readily derived from those given 

(b) Two Special Cases 

The elements of the fundamental matrix for FC i, N t have a somewhat different interpretation 
when the states in FC i have what we call the "0-1 visiting property." We say that a state has the 
0-1 visiting property if the state can be visited no more than one time. Important examples of 
two-characteristic models in which all transient states have the 0-1 visiting property are the models 
in which the FC or SC is length of service or where the SC is time in grade. 

If each state in T t has the 0-1 visiting property, then the expected number of visits to a state 
in 7\ is equal to the probability of visiting the state. The element of N f in row j—l{i) + 1 and column 
m—l(i)-\-l may then be interpreted as the probability of visiting state (r, m) given that FC i is 
entered in state (?', j) . 

Another property of interest is the "no return property." We say that a set of states has the 
no return property if it is impossible to ever make a transition into the state after a transition has 
been made out of the state. The 0-1 visiting property implies the no return property, but they are 
not equivalent. For example, in'modeling manpower flows in the U.S. Civil Service one might use 
"GS grade" as the FC and "pay step" as the SC. Each state is then a couple (grade, pay step). A 


person can stay in the same pay step for more than one period, so if there are no demotions then 
each state would have the no return property but not the 0-1 visiting property. 

If the states in T { have the no return property then it is possible to order the states in T t so 
that Qi is upper triangular. When Q t is upper triangular so is I — Q t and the computation of the 
inverse of I—Q.t, i.e. the fundamental matrix for FC i, N t> is considerably easier than in the general 

If the states in T t have the 0-1 visiting property, then not only is N t upper triangular but also 
the elements of N t on the main diagonal are all ones. 

(c) Second Moment Properties 

The format in this section follows closely that of Section (a), but here we are concerned with 
various second moment properties of the two-characteristic model. 

D10. v 2 , t (j, m)=variance of the number of visits to state (i, m) given that FC i is entered 
in state {%, j), 
V 2 , { —a, w t XWi matrix having v 2 , (j, m) as the element in row j—l(i)-\-l and column 

m-l(i) + l. 

Following (K&S, 3.3.3), 

V 2 . i =N t (2(N f ) dg -I)-(N i ) st . 

where for any square matrix A, A dg and A sq both have the same dimensions as A; A dg is defined 
when A is square and is formed by setting all elements in A not on the main diagonal to zero ; A, q 
is formed by squaring all the elements in A. 

Dll. t 2 , t(j)= variance of the time spent in FC i given that FC i is entered in state (i, j) 

r 2 ,i =[T 2 ,i(Ui)), . . ., T 2ii (u(i))], a w<Xl vector. 
Following (K&S, 3.3.5), 

T 2 , i =(2N t -I)T i -( Tt ) S9 . 


Dl2. v 2 ((i, j), (k, m))=variance of the number of visits to state (k, m) given that FC i i 

entered in state (i,j), 

V 2 (i, k) =a tCjXWt matrix having v 2 ((i, j), (k, m)) as the element in row j—l(i) 

+1 and column m— l(k)-\- 1, 
Following (K&S, 3.3.6), 

v 2 (i, *)=y tt (2(iv*) (Jf -/)-(F tt )„. 



Dl3. r 2 ((i, j), &)=variance of time spent in FC k given that FC i is entered in state (i, j), 

T 2 (i, k) =[T 2 ((i, l(i)), k), . . ., t 2 ((i, u(i)), k)], a w<Xl vector. 

Following (K&S 3.3.6)), 

t 2 (i, k)=B ik {2N k -I)T k -{r tk ) sr 

If each state in T t has the 0-1 visiting property, then the diagonal elements of N t are equal 

to 1, and, 

(N t ) dg =I, 

V 2 (i,k) = V ik -(V tk ) sr 

(d) Matrices of t-Step Transition Probabilities 

In this section we consider the probability of being in state (k, m) t steps after being in state 
(i, j). The matrices of these probabilities are called the i-step transition matrices. They are used in 
section V to represent the stock vectors as a sum of steady-state and transient components. 


Dl4. m(t:(i, j), (k, m))=probability of being in state (k, m) t steps after being in state (i, j), 

t=0, 1, 2, . . . 

M ik (t) =a WiXw k matrix having m(t; (i, j), (k, m)) as the element in row 

j—l{i)-\-\ and column m—l(k) + l. 

The rows of M ik (t) are associated with states in T { ; the columns of M ik (t) are associated with states 

in TV 

We have immediately that 

M u (0)=I. 

From our assumptions on the structure of Q we have, 

M ik (t)=0ii i>k, 
M ik (t)=0ii t<k-i. 

If the process is to be in state (k, m) exactly t steps after being in state (i, j), then it must 
be in some state with FC k or k—\ exactly t— 1 steps after being in state (i, j). Conditioning on 
this fact leads to the recursive equation, 

(7) M«(0=M«(*-1)Q«+M <1 »_,(«-1)P*_ 1J <=1,2, ... 


For any i and k the sum over t of the probability matrices M^t) gives the matrix of the ex- 
pected number of visits to states with FC k starting from states with FC i. So we have, 

±,M ik (t)=V a , i<k, 


=0, otherwise. 

Kecall that the Q* matrices are transient, so V^ is a matrix of finite elements. This implies that, 

(9) limM«(*)=0. 

From (7) it can be shown b} r an inductive argument that 

(10) M«(0=SM < . J _ 1 (*-l-r)P Jt _ 1 Qi t '. 


The /--step transition matrices provide a rather comprehensive picture of how people move 
through a two-characteristic system. 

(e) Conditioning on Promotion When the FC is Grade 

In manpower planning one is often interested in conditional probabilities, e.g. the probability 
of attaining grade k given that grade i is attained. The stochastic properties of the transient ma- 
trix Q under conditioning on promotion when the FC is grade are briefly developed in this section. 


D15. (i, j; t) = the event "in state (i, j) at time t" 

T* =the event "a transition is made into T k before leaving the system." 

Conditioning on the event T$ is the same as conditioning on promotion to grade k. 

Define : 

D16. q t (i, m) = Pr[{i, m; t + \)\{i, j; t)\ 

q*(j, m)=Pr[(i, m; t+l)\(i, j; t), T* +1 ] 
Provided that Pr[T* +1 \(i, j; f)]^0, we have by conditional arguments, 

(ID g1U,ra)= gi (j,m)X^- 


Dl7. d=& WiXWi matrix having the elements of b t (see D4) on its main diagonal and zeros 

We assume that promotion to grade i+1 is possible from every state in T t . Under this as- 
sumption d~ l exists. If promotion to grade t-f-1 is impossible from some state (i, j) then we must 
avoid conditioning on an impossible event. This is readily accomplished by temporarily treating 
state (i, j) as part of jT (out of the system) and redefining J(i), Q„ P t and A t accordingly. 


Define : 

D18. Q*=& WfXWi matrix having q*(j, m) as the element in row j— l(i) + l and column 
to— l(i)+l. 
Then from (11) and Dl7, 

Qf=Cr 1 Q«C < . 

The matrix Q* is the matrix of within grade one-step transition probabilities conditioned on the 
attainment of grade ? + l. 
Define : 
D19. p t (j, m)=JrM(*+l, rn; t+l)\(i, j; t)] 

p*(j, m)=Pr[(i+l, m; t+l)\(i, j; t), Tf +1 ] 

P* =a w t Xw i+ i matrix having p*(j, m) as the element in row j— Z(i)+1 and column 

We then have, 

pUj, m)=p i (j, m ) x ^(j)" 

Thus from D17 and Dl9 


The matrix Pf is the matrix of one-step promotion probabilities conditioned on the attainment 
of grade t+1. 

Because (Q$) T =Cc l QfC u 
the fundamental matrix for grade i is, when we condition on promotion to grade i+1, 



Define : 

D20. v*(j, m)=expected number of visits to state (i, m) given that grade i is entered in state 
(i, j) and grade i+1 is attained. 
Vf =a iCjXttij matrix having v*(j, m) as the element in row j—l(i)-\-l and column 

m-l(i) + l 
D21. 6?(j, m)=probability of entering grade (i+1) in state (i+1, m) given that grade i is 
entered in state (i, j) and grade i+ 1 is attained. 
Iff =a tt|XWj + i matrix having &*(.;', wi) as the element in row j—l(i)-\-l and column 

to— J(i+1)+1. 
Then one may show that, 




Note that B* is simply B t with its rows normalized, but Q* is not simply a row normalized form 
of Q t . 

As with the matrices B it products of matrices B* with successive indices are well defined: 
their meaning is that of a matrix B {k as defined in D6 with conditioning on attainment of grade k. 

The conditioned and unconditioned matrices may be used together. For example, the elements 
of B*B i+ i give the probabilities of entering grade i+2 in the column state conditioned on starting 
from the row state in T t and attaining grade i+1. 


We begin by defining the terms "stocks" and "flows," and then discuss why stocks and flows 
are important in manpower planning models. Next the relations between stocks and flows in a two- 
characteristic model are developed. Finally, we show how the stocks can be represented as the sum 
of a "steady-state" component and a "transient" component. 

(a) Definitions and Background 

A period is the interval of time from immediately after an integer value of the time parameter 
t up to and including the next integer value of t. A period is identified by the value of the time param 
eter at the end of the period. Thus, 

period U={t: t\— 1<£<£i} 

where t x is an integer. 

The number of people in a state at the end of a period is referred to as the "stock" in that state. 
Thus, stocks are counted only at integer values of the time parameter t. 

The number of people who change their status in the system from one state to another during 
any period is referred to as a "flow." Flows occur during a period, but we do not specify the exact 
time at which they occur. 

Stocks and flows are of primary importance in most manpower planning models. The most 
obvious reason for this is that costs are closely related to stocks and flows, e.g., totalpayroll depends 
on stocks; transportation costs or retraining costs depend on flows. Recruiting policy and promotion 
policy depend in the short term on present stocks and in the long term on how we model future 
stocks and flows. Determining the feasibility of a retirement plan and evaluating the effects of a 
change in billet structure are other instances in which the planner needs to be able to model stocks 
and flows in a manpower system. 

We now define the variables that are used to model the stocks and flows in the two-characteristic 
model. Recall that T< is the set of states associated with FC i, w t is the number of states in T h 
and for convenience of notation we assume the second characteristic takes on successive integer 
values for FC i. 

In a Markov' model the stocks and flows are in general random variables. In this section we 
deal only with the expected values of stocks and flows. Such a model is called a "fractional flow 



model" because the transition probabilities of the Markov model are in effect treated as fractions 
which direct flows through the system in a deterministic manner. 

s^(i)=expected stocks in state (i, j) at time t, 

$ ( (t) = (s tlU) (t), . . ., Si.«(t)(0)> a 1XW( vector of expected stocks in T t , 

s(t) = ( Sl (t), 

s n (t)), a lXS^i vector of expected stocks in the system.- 


By our basic assumption, flows into any state in T t must come from a state in either T t or 
Ti-i. We also make provision in our model for "external flows." The source of such flows is un- 
specified. However, we may consider external flows as consisting of people hired into the system 
The external flows may be deterministic or random, but we deal only with their expected values. 


d i; (£)=expected flow from states in T t to state (i, j) during period t, a scalar; 

d t (t)=(d i<ni) (t), . . .,d itU{i) (t)), a lXw, vector; 

jisit) =expected external flow into state (i, j) during period t, a scalar; 

M*) =(Ji,Hi)(t), • • •»/* ««)(*)) i a 1X«< vector; 

(7 i; (0=expected flow from states in T t _i to state (i, j) during period t, a scalar; 

9i(t) =(gi.«i)(t), . . ., g t ,ua)(t)), a lXWi vector. 

When i=l, g tj (t) is defined to be zero. 

The relation between the flow vectors and the stock vector in grade i is depicted in Figure 1, 
where "T t ; t" denotes the states with FCi at time t. 



Figure 1. Stocks and Flows with FC i in Period t. 

(b) Basic Stock Equation 

Clearly, from our assumptions, 

8 t (t)=d t (t)+f t (t)+g t (t). 

(See Figure 1.) 

It will be convenient to define, 

s (t)=Q, a vector of zeros, 
P =0, a matrix of zeros. 


Using conditional expectation we then have 

d i (t)=s i (t—l)Qu i=l, . . .,n, 

g i (t)=s i - 1 (t—l)P i - U i = \, . . .,n. 

The basic stock equation is then, 

(12) 8t(t)=* t it-l)Q t +Mt)+«i-i(l-l)Pt-u i=h ■ ■ -,n. 

The basic stock equation for FC i can be written in terms of the expected or actual stocks 
withFCH in previous periods. By recursively applying the basicstock equation for «*(£), Si(t-l), . . ., 
Si(l) one obtains 

Si(t)=s i (0)Q i t + i £ l Mt-r)Q t '+j:s i - l (t-r- lJP.-.Q/, 

r=0 r=0 

(13) t=0, 1, 2, . . ., 

i = l, . . .,n, 

which we will refer to as the cumulative stock equation. 

Equations (12) and (13) are used frequently in the remainder of this paper. Some manpower 
models used in the U.S. military for short-range forecasting consist principally of an equation 
similar to (12). 

(c) Transient Properties of the Stocks 

In this section we develop a method for expressing the stock vector as a sum of a "steady-state" 
component and a "transient" component. This method helps one to understand how the stock 
vectors change in going from any present stock vector to future stock vectors. This method also 
helps one interpret the character of the limiting stock vector. 

We do not want to restrict ourselves to cases in which the stock vector converges (as t increases) 
to a finite vector. We say that the vector function Si(t) is a steady-state component of the stock 
vector s t (t) if, 

lim (s t (t)-'s t (t))=0. 


For any sequence of stock vectors <CSi(t)^> there is more than one choice of the steady-state com- 
ponent. In applications one would prefer a steady-state component having a relatively simple 
mathematical form. We show that in some cases a judicious choice of s,(0) makes this possible. The 
following theorem shows the properties of a class of steady-state components which are quite 

THEOREM: For any collection of lXw, vectors s 4 (0), i=l, . . ., n, let the vector functions 
s t (t) satisfy 

*<(^8,(i-l)Q«+/*(*)+s«-i(<-l)P<-i, t=l,2, . . ., 

V=l, . . .,71, 


i.e. the vector functions s t (t) satisfy the basic stock equation (12). Then 
(i) the actual stocks at time t are 

»<(*)- «««)+ZJ(«»(0)-7*(0))^(*), 


(h) S («.(0-«i(0)=S («*(0)-**(0))5„iV f , 

a lXwi vector having finite components, 
(iii) s'iit) is a steady-state component of the stock vector s t (t), i.e. 

lim (s t (t)-s i (t))=0. 

Before proving the theorem we explain why one might be interested in such a theorem. Part 
(iii) of the theorem says that s t (t) is a steady-state component of the stock vector s t (t), and part 
(i) shows how the stock vector Si(t) can be expressed as the sum of a steady-state component and a 
transient component. Part (ii) of the theorem says that the total over all periods of the difference 
between the stock vector and its steady-state component is a readily calculated finite vector. 

Such information can be useful when long-range planning has been done using an "equilibrium 
model." As an example consider an organization which intends to change from its present size 
of 250,000 to a size of 200,000. The manpower planner may use an equilibrium model to develop 
policies that are in some sense optimal, and these policies will maintain the organization at 200,000 
people once it has been reduced to this size. So the equilibrium model tells the planner what to 
do once the size of the organization reaches the desired equilibrium level but it doesn't tell him 
how to change the organization from the present level (250,000) to the desired equilibrium level 
(200,000). This problem of finding an optimal transition policy to go from present stock levels 
to a future equilibrium stock distribution is a very difficult one (see [1] Chapter 4). One method 
for making the transition is to immediately implement the hiring, promotion and attrition policies 
that have been derived from the equilibrium model. Because of the transient nature of the system 
these policies will eventually bring the stocks in the system to their equilibrium levels. 

In the theorem the vector functions s t (t) play the role of what the stocks would be at time t if 
the system were in equilibrium. The stock vectors s t (t) indicate what the stocks will be at time t 
if we start with the present stocks s 4 (0) and implement the policies of the equilibrium model (which 
are reflected in the external flows, ji{t), and the transition matrices Q it P t and At). From part (i) 
of the theorem we may readily calculate the difference between actual stocks and equilibrium 
stocks in any grade and any period. If there is a penalty associated with having more people than 
the equilibrium stocks in the system, then part (ii) of the theorem may be used to calculate the 
total penalty. Part (iii) of the theorem assures the planner that the difference between the actual 
and equilibrium stocks does converge to a zero vector as the time parameter t increases. 

The proof of the theorem follows. 

PROOF: By hypothesis the vector functions ««(£) satisfy the basic stock equation (12), so 
they must also satisfy the cumulative stock equation (13) : 


r=0 r=0 

s < (o=«((o)Q i '+z;/i(«-^Qi r +z;Vi(«-r-i)p < _ 1 Q/. 

r=0 r=0 

Of course the stock vectors St(t) also satisfy the cumulative stock equation (13), so we have, 
8 i (t)-s t (t)Msi(P)-sM)Qi t +^(.9t-i(t-r-l)-s t . l (t-r~l))P i . 1 Q/. 


When i=l this implies, 

8i(«)=Si«)+(s 1 (0)-s 1 (0))Q 1 ' 


so we have shown that part (i) of the theorem is true when i=l. Suppose pait (i) of the theorem 
is true for grade i—1, i.e. 

«,-i(«)=Sf-i(0+S(**(0)-**(0))Af t . ,_ x (0. 


s M ((-r-l)-s i . 1 (i-r-l)=2(s t (0)-s t (0))M u . 1 «-r-l), 


*,(0-«««)=(*f(0)-*,(0))Q,'+SS(s*(0)-s«(0))M Jk . J _i(«-r-l)P 1 _ 1 Q/ 

t-l i-\ 

2 2 


Pl r=0 

= («i(0)-* < (0))Q < , +g(«»(0)-«»(0))ZJM». < _ 1 («-r-l)P i _ l Q/ 
From equation (10) in Section IV 

SM*.,_ 1 «~r-l)P f _ 1 <2/=M M (e) > 


so we have shown by induction that, 

« < (0-*i(0 = (««(0)-s < (0))Q < '+S(«»(0)-s t (0))M» < «). 

This proves part (i) of the theorem. 

From part (i), 

S(«,(0-*i(0)=SS(«*(o)-s*(o))M M (o 

e=o <=o *=i 



= S(**(0)-s*(0))B«iV <> 

a lXtOj vector having finite components. 
The last step above follows from equations (6) and (8) of Section IV. This proves part (ii) of the 


Part (iii) follows from the fact that the sum in part (ii) is finite, and the proof of the theorem is 

The utility of this approach depends on our ability to find vectors s t (0) such that the vector 
functions s t (f) are simple and readily calculated. Some examples follow. 

1. Fixed External Flows 

The equilibrium models previously mentioned enjoy some popularity in military manpower 
planning in the United States (see for example [7]). The rationale underlying the use of such models 
is that one should determine the organization structure and the policies to maintain this structure 
which are optimal (or "least infeasible"). Among the policies derived from an equilibrium model 
is the hiruig policy. This had the form, 

Mt)=Ji t = l,2, . . ., 

i=l, . . ., n 

where the vector of the number of people to be hired into the states in grade i each period, j t , is 
specified from the equilibrium model. 


Then using (12) it is easy to show that 

Thus, from the theorem 

Now recursively define, 

Sl (t)=j x N x for all t. 
Si(0 = Si(0 + (si(0)-Si(0))M„(*) 

Si = Si(0=/iM, 

(14) *<=C/i+« < -iP<-i)iV iJ i=2,...,n. 

It is straightforward to verify that these St satisfy the basic stock equation (12), so we have from 
the theorem, when /<(£)=/<, 

»<(*)=«<+g (s k (0)-s k )M ki (t). 
The steady-state component can also be written, 

(15) s t =]bj k B kt N u i=l, . . .,n. 
Note that 

ikB k i 

is a non-negative lXw t vector, so the limiting vector of stocks in grade i must be a non-negative 
combination of the rows of N t . Thus, in general, not all non-negative lXw< vectors are possible 
limiting stock vectors under constant external flows. 


2. Linear Growth of External Flows 

In this subsection we consider the case in which the number of people hired into each state 
increases by the same amount each period. Such a hiring policy may not be natural over a long 
period of time, but it may provide a simple approximation to planned hiring policies. 

Let the lXWi vector f t be the increase in the number hired into states with FC i each period. 
Then the external flow vector for FC i is, 

fi(t)=tf t , 4=1,2, . . ., 

i=l, . . .,n. 

Let the vector function «i (t) satisfy the basic stock equation (1), 

S(*) = Si(*-l)&+/i(0- 
Using the identity NiQi+I=N 1 one can show that 

Thus from the theorem, 

8i(t)=tf 1 N 1 -J 1 N 1 Q l N 1 +(si(0)+f 1 N 1 Q 1 N i )Q l t . 

We note that Si(t) is of the form 

S l (t)^tL l +Ci 
where Li—fjNx is a lXWt vector, 
and Ci^—fiNiQxNi is a lXw< vector. 

Consider some FC ie{2, . . ., n}. Suppose that 

s < _ l «)=e£i-i+C < _i, 

where L t -i and C<_i are 1Xw<-i vectors. 
Using the identity 

(tj i N i -j i N i Q l N i )Q i +(t+l)j i ={{t+\)j l N i -j i N i Q i N l ), 

one may show that 

Ut) = tj i N i -j i N i Q t N i +'s i ^{t-l)P i ^N i -L i ^P i ^N i Q i N u 

then Si(t) satisfies the basic stock equation (12). Note that s^t) has the form, 

8 t (t)=tL t +C it 


(16) LrftNi+Lt-xPi-iNt 



C t =-(f i +L t -iP t - 1 )N t Q t N i -(L t ..i-C^ 1 )P t - 1 N i 

= -((L i ^-C i - l )P i - l +f i N i Q i )N i . 

Thus we have shown that when the external flows grow linearly, the steady-state component of 
the stocks also grows linearly. 

By recursive substitution in (16) we have, 


Note that this vector gives the expected number of visits to states with FC i of f k =J k (t-\- 1 )— /*(0 
entrants with FC k, k=\, . . ., i. That is, the growth in the stocks with FC i each period, L t , 
equals the expected number of visits to FC i of the growth in the external flows each period in the 
FC's less than or equal to i. 

Both L t and C, have the fundamental matrix N t as a right factor, so the steady state com- 
ponent of the stock vector, Si(t) must be a nonnegative combination of the rows of N t . This same 
result was observed in the case of constant external flows. 

In summary we have shown that by choosing 

's i {t) = tL i +C i 


Z.t=/]A^i when i=] , 

=(/ i +£,-iP«-i)iV fI i=2, . . .,n, 

a=-f 1 N 1 Q 1 N 1 when 1=1, 

=-((i i _,-.C i _ 1 )P t _ x +/ i iV i Q,)iV r i> i=2, . . .,n, 
then from the theorem the stock equation may be written 

Sitt) = s«(*)+i(«*(0)-3*(0))M*(<)- 


3. Geometric Growth of External Flows 

In this subsection we show that geometric growth of external flows leads (eventually) to geo- 
metric growth of the stocks. We consider the case in which the external flows into the states in 
grade i are proportional to a known vector f t and grow geometrically at a rate 8 t . Thus, 

ji(t)=e i t j t , t=\,2, . . ., 

1=1, . . .,71 

o t >o. 

When O<0 4 <1, the external flows contract rather than grow. 
If k is not an eigenvalue of Q< for k<i<n we may define, 

WW-(/-5 GO" 


If the states in grade i have the 0-1 visiting property then all eigenvalues of Q t are zero and thus 
0*>O is never equal to an eigenvalue of Q t in this case. 
The following identity will be useful: 

N i (d k )Qi=e lc T l (~Q?j 

r=0 \Vk / 

Then it can be shown that if 

=6 k (-I+N l (d lc )). 

s 1 (t)=d, t f 1 N 1 (d 1 ), 

then s'i(t), t =0, 1, . . ., satisfies the basic stock equation, from the theorem, 

s l (t)=d l 'j 1 N ) (d l ) + (s l (0)-M T 1 (di))M n (t). 

Note that the steady-state component of the FC 1 stock vector grows geometrically at the same 
rate as the external flows into FC 1 . 

B kt (fi k )--= 'n (N m (B k )P m ), \<k<i<n. 


Then it can be shown that if 

'st(t)=p i e t k - ii - k) f k B ki (e k )N t (d k ) 

then Si(t), t=0, 1, . . ., satisfies the basic stock equation (12). Note that in the limit the stocks 
with FC i grow geometrically at the rate of the largest B k where k<i. 


^f=max {e k ; k=\, . . .,i), 

The steady-state component of the stock vector is not in general a non-negative combination of the 
rows of Nt (as was the case with constant external flows and linear growth of external flows). 
Rather the steady-state stock distribution is a non-negative combination of the rows of N t {d M )- 
The rows of N^f) need not be non-negative combinations of the rows of N t , so the limiting stock 
distributions that are possible under geometric growth of external flows need not be the same as the 
limiting stock distributions under constant external flows and linear growth of external flows. 


[1] Bartholomew, D. J., Stochastic Models for Social Processes, 2nd Edition (Wiley, 1973). 

[2] Blumen, I., M. Kogan, and P. McCarthy, The Industrial Mobility of Labor as a Probability 

Process (Cornell University, 1955). 
[3] Grinold, R. C. and K. T. Marshall, Manpower Planning Models (Elsevier-North Holland, 1977). 
[4] Kemeny, J. G. and J. L. Snell, Finite Markov Chains (Van Nostrand, 1960). 


[5] Lane, K. F. and J. E. Andrew, "A Method of Labour Turnover Analysis," Journal of the Royal 

Statistical Society, A118, 296-323 (1955). 
[6] Marshall, K. T., "A Comparison of Two Personnel Prediction Models," Operations Research, 

21, (3) 810-822 (1973). 
[7] RAND Corporation, "Planning in Large Personnel Systems: A Reexamination of the TOP 

LINE Static Planning Model," R-1274-PR (1973). 
[8] Rowland, K. M. and M. G. Sovereign, "Markov-Chain Analysis of Internal Manpower Supply," 

Industrial Relations, 9, (1) 88-99 (1969). 
[9] U.S. Navy, "Computer Models for Manpower and Personnel Management: State of Current 

Technology," (NAMPS Project Report 73-2), Naval Personnel Research and Development 

Laboratory (1973). 


Morris A. Cohen 

The Wharton School 

University of Pennsylvania 

Philadelphia, Pennsylvania 


This paper is concerned with the problem of simultaneously setting price and 
production levels for an exponentially decaying product. Such products suffer a 
loss in utility which is proportional to the total quantity of stock on hand. A con- 
tinuous review, deterministic demand model is considered. The optimal ordering 
decision quantity is derived and its sensitivity to changes in perishabilit3 T and 
product price is considered. The joint ordering pricing decision is also computed 
and consideration of parametric changes of these decisions indicates a non- 
monotonic response for optimal price to changes in product decay. Issues of market 
entry and extensions to a model with shortages are also analyzed. 


Analysis of inventories of goods whose utility does not remain constant over time has involved 
a number of different concepts of deterioration. It is possible to identify problems in which al) 
items in the inventory become obsolete at some fixed point in time (the style good problem), and 
problems where the product deteriorates throughout the planning horizon. The class of products 
subject to on-going deterioration can be broken down into those products with a maximum usable 
lifetime (perishable products) and those without (deca3'ing products). 

Thus, for example, spare parts for military aircraft are st}de goods since they become obsolete 
when a replacement model is introduced. Both blood and certain foods are examples of perishable 
products with a maximum usable lifetime. Volatile liquids such as alcohol and gasoline are products 
which decay and which do not have a maximum lifetime. 

The decrease in utility or loss for an inventory of goods subject to deterioration is usually a 
function of the total amount of inventor}^ on hand. For goods without a maximum lifetime the 
items in the inventory can be grouped together for the purpose of determining how much stock 
will decay at a given point in time. It is clear that the amount of inventory which deteriorates in the 
case where there is a maximum lifetime is a function of the age distribution of all items in the 

This paper will be concerned with the problem of simultaneously setting price and production 
levels for an exponentially decaying product. The earliest attempts at modeling such problems 
were concerned with optimal production decisions only. Ghare and Schrader [6], assuming expo- 


258 M. A. COHEN 

nential decay of the inventory in the face of constant demand, derived a revised form of the eco- 
nomic order quant ty. Emmons [5] also considered a problem of exponential decay where the product 
decayed at one rate into a new product which decayed at a second rate. These models are applicable 
to inventories of ladioactive isotopes. In the first part of this thesis, Van Zyl [16] formulated a 
general age independent perishable good model in which a fixed or stochastic amount of product, 
depending on the total inventory, deteriorates. He demonstrated that for this class of inventory 
models the optimal order policy is of the fixed critical number form and is thus characterized b} r a 
constant order-up-to quantitj^. 

The analysis of price as an inventory decision variable for a nonperishable product has been 
undertaken by a number of authors (Whitin [17], Thomas [14], Karlin and Carr [7], Kunreuther 
and Richard [8], Kunreuther and Schrage [9], Adams [1] and Pekelman [13]). For the most part 
this literature has been concentrated on deterministic models (with some exceptions [7, 14]). The 
only example of an analysis of pricing polic}'" for a perishable product is due to Eilon and Mallya [4] 
in which fairly strong assumptions are made on the form of the issuing sequence in force for a 
maximum lifetime (perishable) product model. The anatysis of ordering policies for perishable 
products has been considered extensively in the last few years (see Cohen [2] for a recent surve}) . 


We shall consider in this section a continuous review, deterministic demand model with 
exponential decay. This model will provide some intuitive insights and analytic results which will 
be useful in analysing the economic tradeoffs inherent in the control of perishable commodities. 
We begin with a no shortage assumption. In the section to follow, this assumption is relaxed and 
the deterministic model is extended to allow for shortages with backlogging. 

Let p stand for the selling price of the product and d{p) for the known demand rate when 
the price is p. Let I(t) be the inventory position at time t and X a positive number representing the 
stock decay rate. As noted previously, perishability will be of the exponential type and hence the 
rate at which stock decays will be proportional to the on hand inventor} 7 , I(t). Demand rate d(p) 
is assumed to be positive and to possess a negative derivative throughout its domain. In the case of 
continuous review it is logical to assume that depletion due to such decay and depletion due to 
meeting demand will occur simultaneously. Accordingly, the differential equation describing the 
time behavior of the system is : 

(1) ^f=-\I(t)-d(p). 

It follows from the fact that this is a first order linear differential equation (as noted in Ghare 
and Schrader [6]) that the solution to (1) is 

(2) I(t)=I(0)e-»-(d(p)/\)[l-e-"]. 

Consequently, Z(i), stock loss due to decay in the time interval [0, t] is the difference be- 
tween the inventory position at time t which would prevail if there was no decay and the 
position with the decay. 

(3) Z(t)=I(t)[e»-l]-d(p)t+(d(p)/\)[e»-l]. 



Since the cost structure to be denned includes holding costs, it is clearly optimal to set I(T)=0, 
where T is the period length of each cycle. Figure 1 illustrates the time behavior of the inventory 
level. Demand rate, d(p) is indicated by the slope of the dashed line. 

Kt) <i 

d(p)T + Z(T) 

T 2T 

Figure 1. Inventory Level vs. Time (No Shortages). 

Given that I(T)=0 and noting that the only loss to the system is due to either decay or 
demand, the following expression for the quantity ordered each cycle results: 

Q T =Z(T)+d(p)T 

= -d(p)T+(d(p)/\)[e^-l}+d(p)T 

(4) = (d(p)/\)[e™-l]. 

Noting that 1(0) = Q T it also follows that 



/(«)-=^[* cr - ,, -l]. 

Let us define unit purchase cost, order set-up cost and unit holding cost by c, K and h dollars, 
respectively. Cost per cycle then becomes, 

C(T,p)=K+cQ r +hf I(t)dt 

(6) =K+c(d(p)/\)[e* T -l)+h(d(p)/\ 2 )[e* T -l-\T} 
for a fixed price level p. Cost per unit time, C(T, p), is 

C(T,p) = C(T,p)/T 

(7) =K/T+([c\+h]d(p)(e* T -l))/\ 2 T-hd(p)/\. 

By holding p fixed we can consider the necessary conditions for minimizing C(T, p) with respect to T. 

(8) * C W V) = h {- K +(^+ h ) d (P)i^ T T-^ T +l)/\ 2 }=0. 

260 M. A. COHEN 

Which implies that 

(9) e* T (\T-l)=K\ 2 /[d(p)[c\+h]]-l. 

(9) can be easily solved numerically for T P , the cost-minimizing f cycle length. An approximate 
solution to (9) can be obtained by using a truncated Taylor series expansion for the exponential 

function, i.e., 

e XT = H-XT+X 2 T72, 

which is a valid approximation for smaller values of XT. It follows that (7), the definition of cost 
per unit time, becomes, 

(10) C(T, p) ~cd(p)[l+\T/2]+K/T+hd(p)T/2 

=cd(p) + (c\+h)d(p)T/2+K/T. 

Thus it is clear that* the Taylor Series approximation for e XT yields an inventory model without 
decay but with holding cost c\-\-h. The derivative of the approximate cost function is, 

dC lr P) =cd(p)V2-K/T 2 +hd(p)/2=0 
and so, 

(11) T p =^2K/(d(p)[c\+h]). 

For fixed selling price p, the optimal cycle length decreases as decay rate X increases. More- 
over, since demand d{p) has been assumed to decrease with increasing p, it is clear that the cycle 
length will increase as the price increases. Thus highly perishable goods facing high demand will 
be replenished more often. We note as well that when X=0 there is no perishability and (11) re- 
duces to the standard form. 

We can also consider the effects of variation in product perishability and price changes on 
the optimal order decision. For comparative. purposes we examine the optimal order rate. From 
(4) and (11), 

(12) QrJT p =d(p)[e^-l]/\T p 

=d(p)[l+\T p /2] 

where the first term corresponds to the demand rate and the second term approximates the rate 
at which units spoil. The sensitivity of the order rate to changes in perishability is determined by, 

!; [QT r /T p ]=d(p)T P !2+\d(p) ^ /2 

=^Kd(p)/2[c\+h] [(\c+2fc)/(2Ac+2A)] 

t C(T, p) is in fact not convex for all values of the cost parameters. A sufficient condition for convexity is 

e*r[l + ( X T-l)»] + 2#xV(cA + /0d(p)>4. 
This condition will be easily satisfied with relatively high set up costs, K or larger values of X. 


Similarly, order rate will respond to a price change as, 




[QTjT,]=d'(p)[l+\T p /4]<0. 

We see then that optimal order rate increases with an increase in decay rate X and decreases with 
increasing price when we assume price to he an external (market-controlled) parameter. 

The validity of the Taylor series approximation for the exponential function decreases when 
values of X closer to 1 prevail. It is important to verify that the response of optimal cycle length 
and order rate to changes in both price p and decay rate X is consistent with the results derived for 
the approximate cost function. An example problem was considered by solving (9) directly for 
T p . The corresponding optimal order rate Qt t /T p was then computed. The results of the experiment 
and the associated values of the cost parameters and values of p and >. are illustrated in Table 1 . 

The expected reactions, i.e., 



<0, ^>0 


'0 and 



were all observed. The computations were repeated with the same results for a number of different 
choices of the cost parameters. 

Table 1. Optimal cycle length and order rate, for example with A"=$250,c=$l/unit 

and 7i = $.5/unit/d and the demand rate funrtion is, d(p)=25— .5p (unit/day) 

P ^v^ 


. 1 







49. 15 


2. 14 






2. 15 




3 .16 


2. 19 



5. 16 

44. 71 


2. 22 
60. 82 






57. 14 



2a. 67 


2. 80 

2. 3o 





41. 11 


262 M. A. COHEN 

In order to consider the optimal price decision we define the profit rate as a function of cycle 
length and price, 

(13) ir(T,p)=pd(p)-C(T,p) 


again using the exponential function approximation. Necessary conditions for maximizing t with 
respect to T and p yield, 

(14) p T =c(l+\T P /2)+hT p /2-d(p)/d'(p). 

Under various assumptions on the form of d(p), equations (11) and (14) can be solved simultaneously 
by numerical methods for the optimal price and period length (p*, T*). For example if d(p) =A-\-Bp 
for 5<0, it follows that 

(15) p T =(c\+h)T p /4+c/2-A/2B 

and consequently, for a fixed period length, optimal price will increase with an increase in decay 
rate X or costs c and h. 

Define ir(T P , p) to be the profit function with the optimal period length in effect. The joint 
price/production problem is then equivalent to 

max tt(T p ,p). 

Using (11) and approximation (13), 

ir(T p ,p)=pd(p)-cd(p){l + (\/2)^2K/d(p)[c\+h]} 

- <jKd(p)[c\+h]/2- (h/2) ^2Kd(p)/[c\+h] 

(16) =pd(p) -cd(p) - Tj2K[c\+h]d(p) , 

which, as expected is the standard expression for revenue less inventory holding costs in a system 
with unit holding cost cK+h. We note that, 

dir{ ^ P) =d(p)+d'(p)[p-c-^K[c\+h}/2d(p))=0 


(17) p=-d(p)/d'(p)+c+jK[c\+h]/2d(p) 

which ;an be solved by successive approximation for various choices of demand rate function d. 
We note as well that &tp=c, 

t(T c , c) = -^2K[c\+h]d(c) 
and so 

(i) ir(T p ,p)<0 iorp<c 

and also 

(ii) - ftKr,,? ) 


=d(c)-d'(c)^K[c\+h]/2d(c) >0. 


Therefore if we assume that d(p) belongs to the class of functions satisfying 
(i) d'(p)<0 

(ii) lim d(p)=0 



lim pd(p) =0, 


then lim ir(T p , p)=0. 


This class is fairly general and includes the truncated linear demand and the exponential demand 
families. Hence the following has been established. 

PROPOSITION 1: ir{T v .p) achieves its maximum at some possibly infinite p*>c and is 
equal to 

T r (T p . > p*)=p*d(p*)-cd(p*)-^2K[c\+h}d(p*). 

Thus it is possible to solve the joint price-production problem for a fairly extensive class of demand 
functions. We note that maximum profit may be zero, in which case p* approaches infinity. 

The variability of the optimal solution, (p*, T*) to changes in decay rate X can also be investi- 
gated numerically for various cost coefficient configurations. Anatysis of the previously discussed 
example indicates that optimal price and order period do not behave monotonically with respect to 
X. These nonintuitive results are illustrated in Table 2. 

The issue of market entry for a price setting monopolist facing inventory costs and a downward 
sloping demand curve was considered by Kunreuther and Richard [8]. The monopolist will enter 
the market only when he can set price to achieve a strictly positive profit. This requirement leads 
to the following relationship in our problem: 

p-c ^2K(c\+h)d(p) 
p pd(p) 

The fractional mark up must exceed the ratio of inventory holding costs to revenue. This will be 
achieved in an interval of prices contained in [c, ») and hence the optimal price will be achieved at 
some finite price strictly greater than c. As X increases the producer will adjust his optimal price 

Table 2. Variation in optimal solution with respect to decay rate X, for example with 

K = $0, c = $l/unit, /i = $.5/unit/day and demand function d (p)=25 — .5p (unit/day) 


Optimal Price 

Optimal Cycle 
Length T* 

Optimal Order 
Rate Q* 

Profit at 





234. 74 

. 10 

26. 50 


11. 71 

232. 40 

. 15 




226. 29 




21. 13 

220. 38 


26. 42 

4. 90 

23. 15 

214. 63 





209. 03 



4. 21 

26. 93 

203. 56 





187. 75 


26. 37 

3. 15 

35. 16 

177. 63 



to remain profitable. As illustrated in Table 2, the possibility of positive profit decreases with 
higher values of X. Thus increased perishability will impose a barrier for market entrj*" on the part 
of the entrepreneur since profits fall as the product becomes more perishable. It is important to note 
that while the optimal price and cycle length decisions do not react monotonically to increases in X, 
there is a marked stability in the value of the optimal price. Thus the tradeoff between revenue 
and loss due to decay may lead to an unexpected pattern of pricing and ordering decisions. For the 
particular example of Table 2 we observe that for low values of X the optimal reaction to increased 
perishability is to increase price. In the range of higher values for X an optimal reaction to increased 
perishability is to decrease price. Cycle length decreases with X, and order rate increases after an 
initial decrease. 

We turn next to the extension of the model to the case where shortages are allowed to occur. 


The model of the previous section is now extended to allow for complete backlogging of excess 
demands. Figure 2 illustrates the time behavior of inventory in this case. Stock is depleted by a 
combination of spoilage and demand in the interval [0, T,] and is backlogged as a result of excess 
demand in the interval [T u T\. The loss of stock due to decay within the cycle of length Tis given 


Z(T x ) = -d(p)T,+d{ P )[e^-\}l\. 

Backlogged demand within the cycle, B, is defined by : 

(19) B(T l )=d(p)[T-T 1 ] 

and order quantity, Q, is the sum of satisfied demand, backlogged demand and loss due to decay. 

(20) Q=Z(T,)+d(p)T. 

Figure 2. Inventory Level vs Time (Shortages). 


By using the same cost structure as before with the addition of s as the unit shortage cost rate 
(to avoid confusion with price p) we can derive cost per cycle as follows. 

C(T, T u p)=K+cQ+h f 'liDdt+s f ' d{p)tdt 

Jo Jo 

=K+cd(p){T-T l +[e^-l]/\}+hd(p){e^-l-\T l ]/\ 2 +sd(p)[T-T l ] 2 /2. 

We can also derive cost per unit time by dividing the above expression by total cycle length T: 

(21) C(T,T u p) = C(T,T u p)/T 

=K/T+cd(p) \—j!—\ — j^}+hd(p) ^f \~sd(p) 2J , ' - 

Using the previously defined approximation for the exponential function yields the following: 

(22) C(T, T u V ) ~cd(p) \ XTl \ X y Ti2/2 +'^}+K/T+hd(p) X ^+sd(p) ^=^ 

=cd(p)(^~^+K/T+M(p)T 1 2 /2T+sd(p)(T-T 1 y/2T. 

Let Ti/T=r) be the fraction of the cycle in which there is no excess demand. Cost per unit time 
can then be expressed as a function of (T, rj, p) as follows: 

(23) C(T,r,,p)=cd(p)+K/T+[cW+hr 1 2 +s(l-v) 2 }d(p)T/2. 
For fixed price, p, C(T, tj, p) must be minimized with respect to T and t? 

^^^ = cd(p)\r 1 2 /2-K/T 2 +hd(pW/2+sd(p)(l-r,) 2 /2 = 

which yields, 

(24) T Ptri =J2K/d(p){c\r, 2 +hr, 2 + 8 (l- V ) 2 }. 

We can restrict the model to exclude shortages by setting 77= 1, which yields the previous^ derived 
result (11). 

W(T, V,P) _ Q 


yields the result 

' c\+h+s 

and so lim 77=1 as expected. It is interesting to note that as X decreases and hence as the product 

becomes less perishable, 77 increases and thus the fraction of the cycle spent backlogging demand 
decreases. Thus decreased perishability has the effect of raising the relative shortage cost. The 
analysis of optimal order rate can be carried out as in the case of the no shortage model. Similar 
conclusions can be derived. 

266 M. A. COHEN 

In order to analyze the! price decision we must again consider the profit rate function. 

r(T, v,p)=pd(p)-V(T, r,,p) 

Differentiating with respect to p, 

=d(p)+d'(p)[p-c(l+\ v 2 T/2)-hr, 2 T/2-s(l-r,) 2 T/2] 


(26) P T , v =c(l+\r, 2 T/2)+hr, 2 T/2+s(l-v) 2 T/2-d(p)/d'(p) 

For the special case of a linear demand function, 

P Ttn =[c\i 2 +hr) 2 +sO—ny]T/4+c/2-AJ2B. 

Thus, again for this case, for fixed values of T and r\, the optimal price decreases as X decreases. 

We can also examine the revenue function when optimal cycle length T Pitt is in force. This 
yields the following, 

ir(T P , v ,p)=pd(p)-cd(p)-cd(p)\r, 2 T p j2-K/T p , v -M(p)v 2 T p J2-sd(p)(l- v ) 2 T P j2 
*(T P . „ p) =pd{p) -cd(p) - cd ^W J2K/d(p)[c\v 2 +h v 2 +s(l-vy}-jKd(p)[c\r, 2 +hr, 2 +s(l- v y]/2 


hdjpW I 2K sd(p)(l- v y I 

2 ~ *\ d(p)[c\ v 2 +h v 2 +s(l-r,) 2 ] 2 *\d(p)[cW+hn 2 +(l-v) 2 

=pd(p) -cd(p) - J2Kd(p)[cW+hr, 2 +s(l-r,) 2 ] 

which is again revenue less inventory carrying costs. First order conditions for a maximum yield, 

and thus the revenue maximizing price satisfies 

(26) p=-d(p)/d'(p)+c+jK[c\r, 2 +h v 2 +s(l-r,) 2 ]/2d(p). 

which is a direct extension of (17). At p=c, 

ir{T c , Tn c) = -^2Kd{c)[c\r ) 2 +hr ] 2 +s{\-r)) 2 \<Q 

&f(r M ,g) 


p ^(e)-i-ic^ K ^ + ^-^ >0, 

It also follows that when lim pd(p)=0 holds, that lim tt(T p ,„, p)=0 as in the no shortage case. Thus 
the following result has been established. 


PROPOSITION 2: fl-(!Tp,„ p) achieves its maximum in the shortage case at a possibly infinite 
p*>c (solving (26)) and is equal to, 

x(7V. , , v*) =P*d(p*) -cd(p*) - y/2d(p*)K[c\r, 2 +h 2 +s (1 -t?) 2 ]. 

The condition for market entry which will ensure that maximum profit is achieved at a finite price 

is given by 

p-c / 2K[C\y 2 +hr,>+s(l—n) 2 ]d(p) 
p V pd(p) 

Thus we have seen that the shortage model represents a direct extension of the no shortage model. 
The influence of perishability, through X, has now become more complex due to the dependence of 
P*» ?V,n and 7j on X. 


The deterministic model presented in this paper can be extended to the case of stochastic 
demand by making assumptions on the nature of the process generating the demand. Multiplicative 
and additive demand factor models were considered in a paper on pricing by Karlin and Carr [7] 
in which there was no perishability. Leyland [10] extended this analysis to an arbitrary demand 
process. The impact of perishability on a stochastic model with pricing as a decision variable was 
recently considered (after the preparation of this paper) by Thowsen [15], in which the existence 
of an optimal joint production-price policy for a stochastic demand model is demonstrated. It 
should be pointed out that the yet to be solved pricing problem for the maximum lifetime (perish- 
able) product system represents a fundamental problem since the utility of the consumer with 
respect to aged goods must be taken into account. 

This paper then is a first step in analyzing the interaction effect of perishability with optimal 
pricing and ordering decisions. The results indicate the importance of formulating and solving an 
appropriate inventory model when perishability and the opportunity for pricing occur simul- 


[1] Adams, C. R., "A Monopolist's Revision of Mathematical Inventory Theory," presented at 
the 43rd National Meeting, Operations Research Society of America (1973). 

[2] Cohen, M. A., Inventory Control j or a Perishable Product Optimal Critical Number Ordering 
and Applications to Blood Inventory Management, Ph.D. dissertation (Northwestern Uni- 
versity, Evanston, 111., 1974). 

[3] Cohen, M. A., "Analysis of Single Critical Number Ordering Policies for Perishable Inven- 
tories," Operations Research 24, 726-741 (1976). 

[4] Eilon, S. and R. V. Mallya, "Issuing and Pricing Policies for Semi-perishables," Proceedings 
of 4th International Conference on Operational Research (Wiley-Interscience, 1966). 

[5] Emmons, H., "A Replenishment Model for Radioactive Nuclide Generators," Management 
Science, 14, 263-274 (1968). 

[6] Ghare, P. M. and G. F.Schrader, "A Model for an Exponential Decaying Inventory," 
The Journal of Industrial Engineering 14, 238-243 (1963). 

268 M. A. COHEN 

[7] Karlin, S. and C. B. Carr, "Prices and Optimal Inventory Policy," in Studies in Applied 
Probability and Management Science, ed. K. J. Arrow, S. Karlin and H. Scarf (Stanford 
University Press, Stanford, 1962). 
[8] Kunreuther, H. and J. F. Richard, "Optimal Pricing and Inventory Decisions for Non- 
Seasonal Items," Econometrica, 39, 173-175 (1975). 
[9] Kunreuther, H. and L. Schrage," Joint Pricing and Inventory Decisions for Constant Priced 
Items," Management Science 7, 732-738 (1973). 

[10] Leland, H. E., "Theory of the Firm Facing Uncertain Demand," American Economic Review, 
52, 278-291 (1972). 

[11] Nahmias, S., "Myopic Approximations for the Perishable Inventory Problem," Management 
Science, 9, 1002-1008, (1976). 

[12] Nahmias, S. and W. Pierskalla, "A Two Product Perishable/Non-Perishable Inventory 
Problem," SIAM Journal of Applied Mathematics, 30, 483-500 (1976). 

[13] Pekelman, D., "Simultaneous Price-Production Decisions," Operations Research, 22, 788-794 

[14] Thomas, L. J., "Price and Production Decisions with Random Demand," Operations Re- 
search, 22, 513-518 (1974). 

[15] Thowsen, G. T., "A Dynamic, Nonstationary Inventory Problem for a Price/Quantity Setting 
Firm," Naval Research Logistics Quarterly, 22, 461-476. 

[16] Van Zyl, G., Inventory Control for Perishable Commodities. Ph.D. dissertation (Universitj 1, of 
North Carolina, 1964). 

[17] Whiten, T. M., "Inventory Control and Price Theory," Management Science, 2, 61-68 (1955). 


Hubert J. Chen 

The University of Georgia 
Athens, Georgia 


There are given k (>2) univariate cumulative distribution functions (c.d.f.'s) 
G(x; 0j) indexed by a real-valued parameter 0„ i=l, . . ., k. Assume that G(x; 0j) 
is stochastically increasing in S . In this paper interval estimation on the i th smallest 
of the 0's and related topics are studied. Applications are considered for location 
parameter, normal variance, binomial parameter, and Poisson parameter. , 


Suppose that we have k populations v x , . . ., ir*, with cumulative distribution functions 
(c.d.f.'s) ^(a;; 0<) (1 <i<k), where 0, is a single parameter of F(x; 8 t ). The goal is interval estimation 
of the I th smallest of 6 U . . ., k , denoted by dm which is unknown. Let X u i=l, . . ., k, be 
mutually independent random samples, each of size n, from population w u i—l, . . ., k, respec- 
tively. Let Ti=T n (Xi) be an estimator of 6 t with c.d.f. G n (t; 4 ). It is assumed that the family 
G n (t; 0) is stochastically increasing in 0, that is, G„(t; 0') >G n (t; B") if 0'<0" for all t. Applications 
are considered when is a location parameter, a scale parameter (e.g., normal variance), and a 
parameter of a specific family of monotone likelihood ratio, e.g., a binomial parameter and a 
Poisson parameter. Further applications are mentioned when is a noncentrality parameter of a 
noncentral ^-distribution and when is a correlation coefficient of a bivariate normal distribution. 

In recent years several authors have considered the interval estimation of ranked parameters, 
such as Chen and Dudewicz [3], Dudewicz [4, 5, 6], Dudewicz and Tong [8], Saxena and Tong [15], 
Saxena [14], Alam, Saxena and Tong [2], and Rizvi and Saxena [13]. Alam and Saxena [1] have also 
considered the parameter in stochastically increasing family. 


In this section we give a basic method for construction of confidence intervals for [fl (1 <i<k) 
by using the i th smallest of T lt . . ., T*,. denoted by T U] . 

LEMMA (2.1) : For any i (l<i<k), the c.d.f. of the i tb smallest statistic T liU //r (ll (0, is a non- 
increasing function of 0, (1 <l<k). 

♦This research was supported by the U.S. Army Research Office; — Durham. 


270 H. J. CHEN 

PKOOF: Fix l(l<l<k), for t=l, . . ., k, and t is between T+ and T* the lower and the upper 
limits of T {i] , let 0=(6j, . . „ 0*), 

H Tm (t)=P$(At least i of T\, . . ., T k exe<t) 

=PtJTi<t and at least i—\ of T u . . ., r,_,, T l+U . . ., T k ax6<t)+Pt(T t >t and at least i of 

T u . . ., TV,, T l+U . . ., T k are <0 
=ff«(<; »,){^»(At least i— 1 of Tj, . . ., TV.!, 2T, + i, . . ., T t are <t)-P e _ (At least t of T u . . ., 

r,.!, Ti+u ■ ■ -,T k are < } +P £ (At least t of T\, . . ., T,_„ TVi, ■ • ., T k are <t) 

which is a non-increasing function of d t because (i) { } is nonnegative, (ii) P$ (At least i of T u . . ., 
TVi, TYfi, • • ., T k are <t) does not involve 6 t) and (iii) G n (t; 0,) is a non-increasing function of 6, 
by assumption. 

ASSUMPTION (2.2): G n (t; 6) is said to be degenerate at 6* and 0* if 

(2.2a) fl for 0=0* 

G n (t;6)=P e (T<t)=\ 
(2.2b) (O for 0=0* 

where 0* and 0* are the smallest and the largest possible values of respectively and te(T*, T*). 

In the following, we will consider a random interval I for [4] (1 <i<k). For preassigned 7€(0, 1), 
we say the event CD ("Correct Decision") occurs iff for any i(l<i<k)d [i] el. One usually tries to 
develop a procedure R for the construction of different T's in such a way that 

(2.3) Inf P,(CD\R)>y 


n(*in) = {0=(0{<], • • • ,0[*i)|0m is held fixed}. 

A. Lower Confidence Intervals 

PROCEDURE R L : Let ^ t be a continuous increasing function with inverse g 2 . Define 

(2.4) ? lL=(h i (T U] \y l ),d*) 

to be a lower confidence interval for (<] with coverage probability y\. We have CD iff 6 {t[ tI L . 

THEOREM (2.5) : Under procedure R L , if Assumption (2.2b) holds, then 

Inf P 9 (CD\R L ) = {G„(g 2 (e U] ); d U] )V, 
f«n(«i«j) - 

and given y x t (0, 1), k, n, and i, (l<i<k), the end point hi(T lti \yi) can be obtained as the unique 
solution in 0, fl of 

(2-6) G n (g 2 (e l(] );e lf] )=G n (T [{i ;d [t] )=y\ /i . 


PROOF: By Lemma (2.1) and Assumption (2.2b), we have 

P l (CD\R)^p,jiT l1] <g 2 (e ltl ))>p t jT ll] <g a (e ll) )\e=(e ll] , . . ., e ltu e*, . . ., e*)) 

=P, M (max (Fi, . . ., Y t ) <g 2 (6 lt] )) 
= {G n (g 2 (9 [t] );6 [t] )y 

where Y u . . ., Y { are i.i.d. r.v.'s with c.d.f. G n {t; (i] ). The rest of the proof of this theorem is 

PROCEDURE R v : Let h 2 be a continuous increasing function with inverse g x Define 

(2.7) I v =(d*,h 2 (T U) \y 2 )) 

to be an upper confidence interval for 6 [n with coverage probability y 2 . We have CD iff (<) e/[/. 
THEOREM (2.8) : Under procedure R v , if Assumption (2.2a) holds, then 

inf P e (CD\R u )={i-6 n (g l (e [i] ); e [i} )} k - i+1 

and given y 2 e (0, 1), k, n, and i (l<i<k), the end point h 2 (T [i] \y 2 ) can be obtained as the unique 
solvs lions in 6 [{] of 

(2.9) G»(ffi<fiu)', 9in)=Gn(T [0 ) 9 [fi )=l-yh"*- t+ » . 

PROOF: By Lemma (2.1) and Assumption (2.2a), the proof is similar to that of Theorem 
(2.5) and will be omitted. 

B. Two-Sided Confidence Intervals 

In Part A we have constructed a lower and an upper confidence interval for 0^ with coverage 
probability at least 71 and y 2 (0<yi, t 2 <1) each respectively. Here we will adopt the method used 
by Dudewicz [5] and obtain a class of two-sided confidence intervals on 0^ (l<i<k) if assump- 
tion (2.2) holds. 


(2.10) /^(WijlYO^Omh)) 

be the intersection of the lower interval I L and the upper interval I v with coverage probabilities 
at least y x and y 2 respectively. We have CD iff 6 U] eI T . 

THEOREM (2.11): Let 71, 72 (0<7i,7s<1) be such that 7i+7«-l = a O fixed and0<a<l). 
Then, under R T , the interval I T is a two-sided confidence interval for (i) with coverage probabil- 
ity at least a. 

272 H. J. CHEN 

PROOF: Define A={0[ a >hi(Tn]\yi)} and B={d U] <h 2 {T U] \y 2 )} , one intercepts the lower and 
the upper intervals and proceeds as in the same proof used by Dudewicz [5]. 

The length of I T is given by 

(2.i2) L( yi )=h a (T li] \ a +i- yi )-h 1 (T lt] \y 1 ). 

The optimum choice of y r which minimizes the length L(y x ) can be determined by numerical 

Alam and Saxena [1] have also constructed a two-sided confidence interval for 0[,]t- They 
started from the general formulation of two-sided interval by using the incomplete beta function. 
While our approach is to intercept two one-sided intervals to obtain the desired two-sided intervals. 
When Assumption (2.2), holds our results and theirs coincide. Assumption (2.2) is basic to our 
work. For example, when T is a normal r.v. with mean ^ and variance one, then P(T<t)=^(t—n) 
approaches 1 as m - * — °° and as m - >+ °°, where <£>(.) is the c.d.f. of a standard normal r.v. As a 
second example, when T is a Chi-square r.v. time <r 2 , then P(T<t) = G(t/(x 2 ) approaches 1 as a 2 — >0 
and as a 2 — *<», where 67(.) is the c.d.f. of a Chi-square r.v. Theorems (2.5) and (2.8) are based on 
this property. 

THEOREM (2.13): For any i (l<i<k), the lower confidence interval I L on d U] which has 
minimal coverage probability 71 has maximal coverage probability 1 — (1 — *yi / *)*~* +1 . And the 
upper confidence interval I v on [(] which has minimal coverage probability y 2 has maximal cover- 
age probability 1 — (1 — yl ,( *- i+€ >y. 

(2.2) we have 

P_e(T [i] <g 2 (d li] ))<P(T Xi] <g 2 (e [i] )\d=(d*, . . ., 0*, H] , . . ., 6 lt] ) 

=Pe w (min (Y u . . ., Y k . i+l )<g 2 (8 lfl )) 

= l-{l-/\, (min (Y u . . ., Y k _ i+l )<g 2 (6 [t] ))} 

Y u . . ., Fjt_ <+ i are i.i.d. r.v.'s with c.d.f. Gnittf^]). Hence 

Sup F 8 (r [i) <y 2 (0 [1) )) = l-{1-A,o (min (Y u . . ., Y^ i+1 )<g 2 (d [t] ))} 

= l-{l-(? n ( ?2 (0 [<1 );0 [<1 )}*- <+1 =i-{l-7! /f } t - ,+1 


G n (g 2 (e [{] );6 [i] )=y\>' i by(2.Q). 

The proof of the upper confidence interval case is similar to that of the lower one and will be 

1 After these results were obtained, the author received a manuscript of Alam and Saxena [1] which inde- 
pendently developed similar results on interval estimation of a ranked parameter. 



Table (2.14) of 1 -(1— 7l / ')*~ i+1 illustrates the maximal degree of overprotection when k—A 
and 5. 

Table (2.14). 1— (1— y^)*"* 4 " 1 



= 4 

k = 5 


7 \ 


i = 2 

t = 3 



i = 2 

i = 3 

i = 4 

i = 5 

0. 99 

1. 000 


1. 000 

0. 997 



1. 000 

1. 000 

0. 998 

0. 95 


1. 000 


0. 987 


1. 000 

1. 000 




1. 000 

1. 000 

0. 999 

0. 974 

1. 000 



0. 999 

0. 979 


0. 998 

0. 999 

0. 995 

0. 946 


1. 000 


0. 997 

0. 956 

0. 70 

0. 992 

0. 996 

0. 987 

0. 915 

0. 998 

0. 999 

0. 999 

0. 993 

0. 931 


0. 974 

0. 989 

0. 975 

0. 880 

0. 990 

0. 997 

0. 996 

0. 986 

0. 903 


0. 938 

0. 975 

0. 957 

0. 841 

0. 969 

0. 993 

0. 991 

0. 975 




0. 907 


0. 740 


0. 958 

0. 964 

0. 932 


0. 10 


0. 680 


0. 562 

0. 410 


0. 846 


0. 631 

LEMMA (2.15): If X and Y are independent r.v.'s with F x (x)=P{X<x)<P(Y<x) = F Y (x), 
xeR, then E{X)>E{Y). 

THEOREM (2.16): For any i(l<i<Jc), by Lemma (2.1) 

Ini {E e (T H] ); eeU(d U] )}=E(i tb smallest of T u . . .,T k ), 


0=(0*, • • ., 0*, 0[;i, • • -, 0[t)) 

K i-l J ^ fc-i'+l ' 

Sup {E t (T U] ); 6en(d U] )}=E{i tb smallest of T u . . ., T k ). 

0=(0[(i, ■ • •, 6 u], 6*, . . ., 6*) 
- \ , J v k _ i __j 

PROOF OF THE INFIMUM CASE: Fix any *(l<t<Jfe); by Lemmas (2.1) and (2.15) we 

E l (T lil )=E,(i th smallest of T u . . ., T k )>E(i tb smallest of T lt . . ., T t ) 

0=(0*> • • •, 0*, 0[i], ■ ■ ., 8it]) 

K t-l J K k-i+l J 

which completes the proof. The proof of supremum case is similar to that of infimum one and will 
be omitted. 

If Assumption (2.2) holds then 


Inf {Ee(T li] );eMd [i ])}=E eii] (^n{T u . . .,T k . i+1 )), 
Sup{Ee(T U ]);e&(6 li ])}=Ee U] (mnx(T l , . . ., T t )) 
where Ti, . . ., 7\_ 1+1 or T i} . . ., T f are i.i.d. r.v.'s with parameter 6 lt] . 

274 H. J. CHEN 


A. Location Parameter 

Supp -so that G n (t;8 t ) is a continous c.d.f. and G n (t; 8t)=G n (t—8 t ) for te(T*, T*) &ndd t e(8*,8*), 
»s=l, . . ., k. Then, from (2.4), (2.6), (2.7), (2.9), (2.10), and Theorem (2.11), the lower, the upper, 
and the two-sided confidence intervals for M] with coverage probabilities y it y 2 , and 71+72—1=0; 

respectively are given b 

!L=(T U] -G n -i(y[< i ),d*) > 

I =(8 m Tw-Gn-Kl-yl' 1 *-'™)), 

lT=(T U] -G n -i(yY<),T U] -G n (l-yl<«>- i+ »)) 

where G n ~ l (.) is the inverse of G n (.). 

For normal populations with know variances the confidence intervals, with T [(] =X {i] and 
G n {. )=$(.), for On^—^fi are the results of Dudewicz [4, 5, 6]. 

B. Scale Parameter 

Suppose that G n (t; 8 % ) is a continuous c.d.f. and G n (t; 8 i )=G n (t/8 i ) for all tz{T*, T*) and 
8 t e(8*, 8*), i=\, . . ., k. Then, from (2.4), (2.6), (2.7), (2.9), (2.10), and Theorem (2.11), the lower, 
the upper, and the two-sided confidence intervals for 8 {n with coverage probabilities y u y 2 , and 
Yi+72 - l=a respectively are given by 

i v =(fito rHj/e.-Ki-y.'**- 14 -")), 


lT={T w IG n -\yT),T m IG n -Kl-yy {k - i+l) )). 

In the following we give an example of a normal variances case. 

Let observations from ir i (l<i<k) be independent and normally distributed N(n u a 2 ) with 
scale parameter a 2 unknown. The goal is to estimate of,,, the i th smallest variance. Let Si 2 , . . ., S k 2 
be mutually independent unbiased variance estimator.-, each based on size n for a x 2 , . . ., cr k 2 from 
populations iri, . . ., ir k respectively. We know that S t 2 has o , as a scale parameter with p.d.f. 
<7n(z; a 2 ) and c.d.f. G n (x; <j t 2 ) which depend on x, a 2 onby through x/a 2 , i=\, . . ., k. g n (x)=g n (x; 1) 
and G n (x)=G n (x, 1) are respectively the p.d.f. and c.d.f. of a chi-square/(n— 1) r.v. with (n—1) d.f. 

The theorems and lemmas of Section 2 can be applied to the above normal variances case as 

(1). For any i (l<i<k), the c.d.f. of Sf ih namely H& (x), decreases as a 2 (l<l<k) increases; 

(2). For any i (l<i<k), 

(3.1) Inf {EASf„); <r 2 ^n(af,)}=<rf^;_ i+1 (^) 



(3.2) Sup {EASfo; a 2 £ (<r? fl ) }=riM9n) 



K- t+ A9n) = ^ y{k-i+l)[\-G n {y)r t 9n(y)dy 

h'i(g n )=\ yi[G„(y)Y 1 g n (y)dy; 

(3). For any i (l<i<k), 

(3.3) **icK-t+t<9»)<E : *(Sti<otMJi 

(4). For any i (l<i<k), an upper confidence interval for <jf t] with coverage probability at 
least 7 2 is given by (0, S 2 iJb n ), where b n is the solution of 

(3.4) (l-G n {b n )f-^=y 2 

where G n (y) is the c.d.f. of a chi-square/(n— 1) r.v. with n—\ d.f. ; 
(5). For any i (l<i<k), a lower confidence interval for of fl with coverage probability at 
least 7! is given by (iSft,/a n , ») where a n is the solution of 

(3.5) G n (a n y= yi ; 


(6). For any i (l<i<k), the upper confidence interval of (4) on of fl which has minimal 

probability of coverage y 2 has maximal probability of coverage 1 — (1 — yl nic ~ i+1) ) i . 

And the lower confidence interval of (5) on of,, which has minimal probability of coverage 

7i has maximal probability of coverage l — (l — y\ /i ) k ~ i+1 ; 
(7). In (4) and (5) we have obtained an upper and a lower confidence interval for af {] with 

coverage probability at least y 2 and 71 (0<7!, 7 2 <1) each respectively. We now obtain a 

class of two-sided confidence intervals for a 2 U] (l<i<k) by Theorem (2.11), the intervals 

are of the form 

(Si , fl /<e , .- 1 W /l ),'%/fl , ." 1 (i-7i /a - |+,) )) 

being a class of two-sided confidence intervals for a 2 iU where 6?„ -1 (-) is the inverse of the 
c.d.f. G n (-) of a chi-square/(n— 1) r.v. with (n— 1) d.f. 
Tables for values of b n and a n are given in Tables (3.6) and (3.7) for any i (l<i<k) and an}- 
k (1<&<5). For example, given 7=0.95, k=5, i=3, and d.f. = 30, we wish to find the coefficient b n 
of an upper confidence interval on of 3J , we have k—i+l = 5— 3+1=3, read Table (3.6) and find 
the column with number 3 on the top then we have b n = 0.5321 with d.f. = 30; the coefficient a n of 
a lower confidence interval on of 3] can be found by reading the column 3 of Table (3.7), we obtain 
a„ = 1.6226 with d.f. = 30. The figures of a„ and b n are accurate up to the fourth decimal place 
(with ±1 unit) according to the error analysis of Dudewicz, Ramberg, and Chen [7]. 


Table (3.6). Values of b n for l<i<fc(l <fc<5). 




N. k-i+l 

df N. 












0. 0U6 

0. 0*26 

0. 0H5 

0. 0010 

0. 0039 

0. 0560 

0. ouo 

0. 0U8 

0. 4 39 



0. 0103 

0. 0128 

0. 0171 

0. 0256 

0. 0513 

0. 0020 

0. 0025 

0. 0033 

0. 0050 

0. 0101 


0. 0388 

0. 0452 

0. 0550 

0. 0726 

0. 1173 

0. 0129 

0. 0150 

0. 0182 

0. 0239 

0. 0383 


0. 0751 

0. 0844 

0. 0982 

0. 1219 

0. 1777 

0. 0324 

0. 0363 

0. 0421 

0. 0518 

0. 0743 


0. 1118 

0. 1232 

0. 1397 

0. 1672 

0. 2291 

0. 0561 

0. 0616 

0. 0695 

0. 0824 

0. 1109 


0. 1465 

0. 1592 

0. 1774 

0. 2073 

0. 2726 

0. 0812 

0. 0879 

0. 0974 

0. 1127 

0. 1454 


0. 1782 

0. 1918 

0. 2112 

0. 2425 

0. 3096 

0. 1060 

0. 1136 

0. 1244 

0. 1414 

0. 1770 


0. 2071 

0. 2213 

0. 2414 

0. 2736 

0. 3416 

0. 1298 

0. 1382 

0. 1498 

0. 1682 

0. 2058 


0. 2333 

0. 2479 

0. 2685 

0. 3012 

0. 3695 

0. 1524 

0. 1613 

0. 1735 

0. 1929 

0. 2320 


0. 2571 

0. 2720 

0. 2929 

0. 3258 

0. 3940 

0. 1736 

0. 1829 

0. 1958 

0. 2157 

0. 2558 


0. 2789 

0. 2940 

0. 3150 

0. 3480 

0. 4159 

0. 1935 

0. 2031 

0. 2163 

0. 2368 

0. 2776 


0. 2989 

0. 3141 

0. 3351 

0. 3681 

0. 4355 

0. 2121 

0. 2219 

0. 2355 

0. 2563 

0. 2976 


0. 3173 

0. 3325 

0. 3535 

0. 3864 

0. 4532 

0. 2295 

0. 2395 

0. 2533 

0. 2744 

0. 3159 


0. 3342 

0. 3494 

0. 3704 

0. 4032 

0. 4693 

0. 2459 

0. 2560 

0. 2699 

0. 2912 

0. 3329 


0. 3500 

0. 3651 

0. 3861 

0. 4186 

0. 4841 

0. 2612 

0. 2715 

0. 2855 

0. 3069 

0. 3486 


0. 3646 

0. 3797 

0. 4005 

0. 4328 

0. 4976 

0. 2757 

0. 2860 

0. 3001 

0. 3215 

0. 3633 


0. 3783 

0. 3933 

0. 4140 

0. 4460 

0. 5101 

0. 2893 

0. 2997 

0. 3138 

0. 3353 

0. 3769 



0. 4060 

0. 4266 

0. 4583 

0. 5217 

0. 3022 

0. 3126 

0. 3267 

0. 3482 

0. 3897 


0. 4030 

0. 4179 

0. 4383 

0. 4698 

0. 5325 

0. 3144 

0. 3248 

0. 3389 

0. 3603 

0. 4017 


0. 4143 

0. 4291 

0. 4494 

0. 4806 

0. 5425 

0. 3259 

0. 3363 

0. 3504 

0. 3718 

0. 4130 


0. 4622 

0. 4765 

0. 4960 

0. 5258 

0. 5845 

0. 3757 

0. 3860 

0. 4000 

0. 4209 

0. 4610 


0. 4997 

0. 5134 

0. 5321 

0. 5606 


0. 4156 

0. 4257 

0. 4393 

0. 4597 

0. 4984 


0. 5553 

0. 5680 

0. 5854 

0. 6117 

0. 6627 

0. 4760 

0. 4856 

0. 4985 

0. 5178 

0. 5541 


0. 6258 

0. 6371 

0. 6524 

0. 6755 

0. 7198 

0. 5546 

0. 5633 

0. 5750 

0. 5924 

0. 6247 


0. 7015 

0. 7110 

0. 7237 

0. 7429 

0. 7793 

0. 6412 

0. 6487 

0. 6587 

0. 6734 

0. 7007 

Note that 0.0n6 means 0.00016 and 0.0560 means 0.0000060. 

C. Binomial Parameter 

A binomial distribution belongs to the family of monotone likelihood ratio which is also sto- 
chastically increasing in parameter (see, e.g., Lehmann [9]) and satisfies Assumption (2.2). Take 
n independent observations X u , . . ., X in from Bernoulli population t, with proportion (param- 
eter) t =Pi (0<Pj<1), i=l, . . ., k, then the sample mean 

— ". 
X t =^2, XJn, 

as an estimator of P u is a binomial r.v. with parameter P t , i=l, . . ., k. Let ^ m <. . .<Xm 
denote the ranked values of X Xl . . ., X k , and Pm<. ■ <Pw the ranked values of P it . . ., Pt- 
The c.d.f. of X t is given by 

G n (x; P,)=l-/j»,(M+l, n-[nx]) 

where Ip t (.) is the incomplete beta function and [y] is the smallest integer >y. We note that the 
observed value of nX U] will be an integer. Then, from (2.6) and (2.9), the end points Ai(Xm|7i) 

Table (3.7). Values of a n for l<t<ft (1<*<5). 


7 = 


7 = 0.99 

df ^\ 












3. 8415 

5. 0018 

5. 7013 

6. 2047 

6. 5985 

6. 6349 

7. 8748 

8. 6093 

9. 1335 



2. 9957 

3. 6761 

4. 0773 

4. 3629 

4. 5848 

4. 6052 

5. 2958 

5. 7005 

5. 9878 

6. 2105 


2. 6049 

3. 1068 

3. 3995 

3. 6066 

3. 7669 

3. 7816 

4. 2776 

4. 5664 

4. 7708 

4. 9290 


2. 3719 

2. 7783 

3. 0136 

3. 1794 

3. 3074 

3. 3192 

3. 7137 

3. 9424 

4. 1039 

4. 2287 


2. 2141 

2. 5601 

2. 7594 

2. 8994 

3. 0074 

3. 0173 

3. 3487 

3. 5403 

3. 6754 

3. 7796 


2. 0986 

2. 4026 

2. 5770 

2. 6993 

2. 7934 

2. 8020 

3. 0902 

3. 2564 

3. 3734 

3. 4636 


2. 0096 

2. 2825 

2. 4385 

2. 5477 

2. 6317 

2. 6393 

2. 8959 

3. 0435 

3. 1473 

3. 2273 


1. 9384 

2. 1873 

2. 3291 

2. 4282 

2. 5043 

2. 5113 

2. 7435 

2. 8769 

2. 9706 

3. 0428 


1. 8799 

2. 1095 

2. 2400 

2. 3311 

2. 4009 

2. 4073 

2. 6203 

2. 7424 

2. 8280 

2. 8940 


1. 8307 

2. 0444 

2. 1657 

2. 2502 

2. 3150 

2. 3209 

2. 5181 

2. 6310 


2. 7711 


1. 7886 

1. 9891 

2. 1026 

2. 1816 

2. 2422 

2. 2477 

2. 4318 

2. 5371 

2. 6108 

2. 6675 


1. 7522 

1. 9413 

2. 0482 

2. 1226 

2. 1795 

2. 1848 

2. 3577 

2. 4565 

2. 5257 

2. 5788 


1. 7502 

1. 8995 

2. 0007 


2. 1249 

2. 1299 

2. 2932 

2. 3865 

2. 4517 

2. 5018 


1. 6918 

1. 8625 

1. 9588 

2. 0257 

2. 0769 

2. 0815 

2. 2366 

2. 3249 

2. 3868 

2. 4342 


1. 6664 

1. 8296 

1. 9215 

1. 9853 

2. 0341 

2. 0385 

2. 1862 

2. 2704 

2. 3292 

2. 3744 


1. 6435 

1. 8000 

1. 8880 

1. 9491 

1. 9957 

2. 0000 

2. 1412 

2. 2216 

2. 2778 

2. 3209 


1. 6228 

1. 7732 

1. 8577 

1. 9164 

1. 9611 

1. 9652 

2. 1006 

2. 1776 

2. 2314 

2. 2727 


1. 6038 

1. 7488 

1. 8302 

1. 8866 

1. 9297 

1. 9336 

2. 0638 

2. 1378 

2. 1894 

2. 2291 


1. 5865 

1. 7265 

1. 8051 

1. 8595 

1. 9010 

1. 9048 

2. 0302 

2. 1015 

2. 1512 

2. 1893 


1. 5705 

1. 7060 

1. 7820 

1. 8345 

1. 8747 

1. 8783 

1. 9994 

1. 0682 

2. 1162 

2. 1529 


1. 5061 

1. 6237 

1. 6894 

1. 7348 

1. 7694 

1. 7726 

1. 8767 

1. 9358 

1. 9769 

2. 0084 


1. 4591 

1. 5641 

1. 6226 

1. 6629 

1. 6936 

1. 6964 

1. 7887 

1. 8410 

1. 8773 

1. 9051 


1. 3940 

1. 4820 

1. 5308 

1. 5644 

1. 5899 

1. 5923 

1. 6689 

1. 7121 

1. 7421 

1. 7650 


1. 3180 

1. 3871 

1. 4252 


1. 4712 

1. 4730 

1. 5323 

1. 5657 

1. 5888 

1. 6064 


1. 2334 

1. 2947 

1. 3229 

1. 3421 

1. 3567 

1. 3581 

1. 4015 

1. 4259 

1. 4427 

1. 4555 

and h2(X lf] \y 2 ) of the intervals given by (2.4), (2.7), and (2.10) are determined by the unique 
solutions in P (fl of the following equations respectively 


1 -//>,, (nX { „ + 1 , n-nX l Q ) = 7 i", 
l-/p, fl (n^ tl i,»-«^ t< i+l)=l— #«- <+1) . 

The end points in (3.8) and (3.9) can be obtained by numerical methods from the incomplete 
beta function tables (see, e.g., K. Pearson [12]) and its interpolation. 

LEMMA (3.10): If X u . . ., X t are independent r.v.'s each having Binomial distribution 
(to, P Ifl ), and _ 

^[fl=max {Xi, . . ., Xi}, X m =min {X u . . ., X t } 

where X i =X i /n, and X i =^X ii then 

^(X W] )=l-i|:(Pr(x))' 

278 H. J. CHEN 


= l-I P[U (t+l,n-t) t=0,l,2, . . .,n. 

PROOF: By the property of P(X [l] <x) = l-(l-Pr(x)) l SindP(X [l] <x) = (Pr(x)) 1 , and using 
the result on p. 211 of Parzen (1960) we have 

E(X m ) = f"(l-ft(*))Vfe=§ (l-Pr(x))' 

JO 1=0 


E(X m ) = f"(l-(Pr(x))')rfx=S (l-(Pr(x))0 

JO i=0 

LEMMA (3.11): For any i (l<i<k), 

Sup E P (X U] )=l n iZ{l-(Pr(u)y} 
P£a(p M ) ~ n u =o 


Inf E P (X in )=li:{l-(?T(u)r-*+ 1 } 
PCW.i) " n u =o 

PROOF: This follows from Theorem (2.16) and Lemma (3.10). 
LEMMA (3.12): For. any i(l<i<Jfe) 

lim Sup£ P (Z [() )=0=lim Inl E P (X U] ) 

P[il—0 P " Pii]-*0 P 

lim Sup E p (X w ) = l= lim Inf E p (X [t] ). 
P|<]-1 p ~ Pwi-»i p 

PROOF: Since Pr(u) approaches 1(0) as P approaches 0(1) and by using Lemma (3.11). 
From Theorem (2.16) and Lemma (3.11) we have 

£s {(i-Fv(u)r-^}<E P (x li] )< 1 - n j: {i-(Pv(u)y}. 

u u=o *■ n u =o 

D. Poisson Parameter 

The Poisson distribution with parameter X belongs to the family of monotone likelihood ratio 
which is stochastically increasing in X. Take n independent observations X n , . . ., X in from Pois- 

_ k 

son population w t with parameter X i (0<X i <oo) ) i=l, . . ., k, then the sample mean X i ='%2 X t j/n, 


as an estimator of X,, is again a Poisson r.v. with parameter n\ u i=l, . . ., k. Let X {i] denote the 

i th smallest value of X's and X [4] the i th smallest value of X's, i=l, . . ., k. The c.d.f. of X t is 

given by 

G n (x;\ t ) = l-T n x,([nx] + 1) 


where r n x< (.) is the incomplete gamma function. Then, from (2.6) and (2.9), the end points of the 
intervals given by (2.4), (2.7), and (2.10) are determined by the unique solutions in nX [fl of the 
following equations respectively 

(3.13) l-r nXui (nX li] + l)=y\ /i , 

(3.14) l-T mit (nX lt] )=l-yk ,i *- t+l) . 

The end points in (3.13) and (3.14) can be obtained by numerical methods from the incomplete 
gamma function tables (see, e.g., K. Pearson [11]) and its interpolation. From Theorem (2.16) 
and a lemma similar to Lemma (3.10) we have 

Sup {ExiXm); x£$2(X [fl )}=£*,„ {max of X w , . . ., X (i) ] 


Inf {E l (X l{i y,\€Q(\ lil )}=E Xli] {mmoi X (i) , . . ., X (k) } 

=4S{(l-Kr(«))*-* +1 } 


Pr(M) = l-r^ I0 (t*+l)=S (e-" x ">(wX I( ,)«)/a!, u=0,l,2, . . 

a = 

E. Other Examples 

Other examples include the noncentrality parameter of a noncentral t distribution and the cor- 
relation coefficient of a bivariate normal distribution. 


The author wishes to thank Professor Edward J. Dudewicz and a referee for comments and 
suggestions on earlier version of this paper. 


[1] Alam, K. and K. M. L. Saxena, "On Interval Estimation of a Ranked Parameter," Abstract, 
Bulletin, Institute of Mathematical Statistics, 2 (3) 118 (May 1973). 

[2] Alam, K., K. M. L. Saxena and Y. L. Tong, "Optimal Confidence Interval for a Ranked 
Parameter," Journal of the American Statistical Association, 68, 720-725 (September 1973). 

[3] Chen, H. J. and E. J. Dudewicz, "Procedures for Fixed-Width Interval Estimation of the 
Largest Normal mean," Journal of American Statistical Association, 71, 752-756 (Septem- 
ber 1976). 

[4] Dudewicz, E. J., "Confidence Intervals for Ranked Means," Naval Research Logistics Quar- 
terly, 17, 69-78 (March 1970). 

[5] Dudewicz, E. J., "Two-Sided Confidence Intervals for Ranked Means," Journal of the Ameri- 
can Statistical Association, 67, 462-464 (June 1972). 

280 H. J. GHEN 

[6] Dudewicz, E. J., "Point Estimation of Ordered Parameters: The General Location Parameter 

Case," Tamkang Journal of Mathematics, 3, 101-114 (November 1972). 
[7] Dudewicz, E. J., J. S. Ramberg and H. J. Chen, "New Tables for Multiple Comparisons with 

a Control (Unknown Variances)," Biometrische Zeitschrift, 17, 13-26 (1975). 
[8] Dudewicz, E. J. and Y. L. Tong, "Optimal Confidence Intervals for the Largest Location 

Parameter," Statistical Decision Theory and Related Topics, ed. S. S. Gupta and J. Yackel, 

363-375 (Academic Press, Inc., New York, 1971). 
[9] Lehmann, E. L., Testing Statistical Hypothesis (John Wiley and Sons, Inc., New York, 1959). 
[10] Parzen, E., Modern Probability Theory and Its Applications (John Wiley and Sons, Inc., New 

York, 1960; Fourth Printing, March 1963). 
[11] Pearson, K., Tables of the Incomplete T Function, Computed by the Staff of the Department of 

Applied Statistics, University of London (London, 1957). 
[12] Pearson, K., Tables of the Incomplete Beta Function (Cambridge University Press, London, 2nd 

ed., 1968). 
[13] Pvizvi, M. H. and K. M. L. Saxena, "On Interval Estimation and Simultaneous Selection of 

Ordered Location and Scale Parameters," Annals of Statistics, 2, 1340-1345 (November 

[14] Saxena, K. M. L., "Interval Estimation of the Largest Variance of k Normal Populations," 

Journal of the American Statistical Association, 66, 408-410 (June 1971). 
[15] Saxena, K. M. L. and Y. L. Tong, "Interval Estimation of the Largest Mean of k Normal 

Populations with known Variance," Journal of the American Statistical Association, 64, 

296-299 (March 1969). 
[16] Saxena, K. M. L. and Y. L. Tong, "Optimum Interval Estimation for the Largest Scale 

Parameter," Abstract, Optimizing Methods in Statistics, ed. J. S. Rustagi, 477 (Academic 

Press, Inc., New York and London, 1971). 


Ingjaldur Hannibalsson 

Ohio State University 
Columbus Ohio 

Ralph L. Disney 

University of Michigan 
Ann Arbor, Michigan 


We present some results for M/M/l queues with finite capacities with delayed 
feedback. The delay in the feedback to an M/M/l queue is modelled as another 
M-server queue with a finite capacity. The steady state probabilities for the two 
dimensional Markov process {N(t), M(t)) are solved when N(t)=queue length at 
server 1 at t and M{t)— queue length at server 2 at t. It is shown that a matrix 
operation can be performed to obtain the steady state probabilities. The eigen- 
values of the operator and its eigenvectors are found. The problem is solved by 
fitting boundary conditions to the general solution and by normalizing. A sample 
problem is run to show that the solution methods can be programmed and mean- 
ingful results obtained numerically. 


The problem that will be examined in this paper is one in the class of networks of queues with 
delayed feedback. We modeled the system as follows. This system has two servers. It is assumed 
that customers arrive at server I according to a Poisson process with rate X. Server I has a waiting 
space for N customers, where N will be assumed to be finite. Service is on a first come, first served 
basis. The service times are exponentially distributed with mean 1/mi- When a customer has com- 
pleted service in server I, with probability p he goes to server II and with probabilit}' 2=1— p he 
leaves the system. Server II has a waiting space for M customers, and M will be assumed to be 
finite. The service times at server 77 are assumed to be exponentially distributed with mean l//i 2 . 
The service is on a first come, first served basis. When a customer has completed service at server 
II he goes back to server I with probability 1 . This system will be analyzed when blocking occurs 
(i.e. when a customer that is supposed to enter the other server occupies his own server until a 
space at the other server gets empty). Only steady state probabilities are obtained. 

The second server provides the delay in feedback. While such servers physically exist in some 
systems, our interest in them is only to have them serve as one means of providing a delay in 
feeding back. The delay times then are random and correspond to the time the job spends at the 
second server. If the blocking time at the second server is neglected in determining the feedback 
time, then the amount of delay is simply the total time spent at the second server. 

*This research was supported in part by the United States Office of Naval Research under contract number 
N00014-75-C-0492 (NR 042-296). Reproduction in whole or in part is permitted for any purpose of the United 
States Government. 




The case in which M, N= oo has been studied by Jackson [3] who found that the system acted as 
two independent M/M/l queues. Independence is lost, however, in the above stated problem. Other 
work that we can find that might include the stated problem as a special case (e.g. [1, 5], likewise 
treat the case of M=N=™ or do not provide for blocking [4]. Since the blocking phenomenon itself 
is of some practical importance, we have chosen to keep M, N finite. For M—>°°, the problem can 
be solved using a limiting argument on the results to follow when iV< oo. For N— ><x>, M< oo, the 
following methods involve infinite dimensional matrices. Whether the methods of this paper go 
through in that case is presently unknown. The case where blocking does not occur and therefore 
some jobs do not gain access to the waiting lines, i.e. overflow, has been studied. Since the analysis 
is closely parallel to that given in the following study we omit it here. Full results are available in [2]. 


The queue length problem is concerned with a two dimensional, irreducible Markov chain 
(N(t), M{t)) with state space ((n, m) : 0<n<N<^°°, and 0<ra<M<oo). A state (n, m) of the 
system denotes n customers at server I and m customers at server 77. N and M will be assumed to 
be finite. The probability of being in state (n, m) is denoted by p n , m . Both servers are empty if 
(n, m) = (0, 0). 

It assumed that a customer, who has completed service in server I and is supposed to feed 
back, blocks server I if he cannot enter the waiting space of server II because it already contains 
M customers. It is also assumed that a customer who has completed service in server II blocks 
server II if he cannot enter the waiting space of server I because it already contains N customers. 
Arriving customers who find N present in the waiting space of server I are cleared. 

'N{t), M{t)) is a finite, irreducible Markov chain. A is the infinitesimal generator of the process. 
In this case special blocking states have to be defined. The state when there are i customers at 
server I, one of which is waiting for an empty space at server II, is denoted by ((?" — 1, M+l): 
i=l, . . ., N). The state, when there are j customers at server II, one of which is waiting for an 
empty space at server I, is denoted by ((/V+l, j — 1): j=l, . . ., M). 

By lexicographically ordering the states, the A matrix, which is ((N+l)(M-\-l)+N-\-M) X 
((N+l)(M+l)-\-N+M) dimensional can be written as follows. 

— A. A 3 
A\ — A2 A 3 


A 1 

— A-i A 3 
A\ — A\ yig 

Ay — A$ A\o 

Aa Ah 

m/m/i queue with delayed feedback 283 

A is an (M+2)X(M+2) diagonal maxtrix with X for its first diagonal element and X+M2 for 
all other diagonal elements. 

Ai is an (M+2)X(M+2) bidiagonal matrix with qui everywhere on the main diagonal except 
that the last term is 0. p/xi lies everywhere on the first superdiagonal. 

A 2 is an (M+2)X(M+2) diagonal matrix with X+mi for its first diagonal term, X+ju 2 for its 
last diagonal term and X+M1 + M2 everywhere else on the diagonal. 

A 3 is an (M+2)X(M+2) bidiagonal matrix with X everywhere on the main diagonal and y. 2 on 
the first subdiagonal. 

A t is an (M+2)X(M+2) diagonal matrix with X+Mi for its first element, X+M1 + M2 for the 
remaining diagonal elements except the last whjch is ix 2 . 

A s is an (M+1)X(M+1) diagonal matrix with mi for its first element and M1 + M2 elsewhere on 
the diagonal. 

A 6 is an MXM diagonal matrix with mi on the diagonal. 

A 7 is (M+l)X(M+2) bidiagonal matrix with qm on the main diagonal and pm on the first 

A s is an MX(M-\-l) bidiagonal matrix with the same diagonal elements as A 7 . 

A g is an (M+2)X(-M+1) bidiagonal matrix with X on the main diagonal and n 2 on the first 

Ai is an (M+1)XM matrix with n 2 everywhere on the first subdiagonal and zero everywhere 

As the Markov process is finite and irreducible, there exists a unique probability vector P with 
its elements ordered lexicographically, satisfying 

(2.1a) PA=0, (2.1b) 0<P<1, (2.1c) SSp„=l. 

v m ** n m 

The elements of P are the usual steady state probabilities for a finite dimensional Markov 

Letting P=(Po, Pi, • • .,P N ,P N +i) where P k =(p k . 0> p k , h . . ., p k .M, Pkm+i), (or k=0, 1, . . ., 

N—l,P If =(p irt0 ,p 1 , tU . . .,p N , M ) and P N+l =(p N+h ,Pn+i,u ■ ■ ■> Pn+i.m-i), (2.1. a) can be rewritten 

(2.2.a) P N A lo -P N+1 A t =0, 

(2.2.b) P iV _ 1 ^4 9 -P Jv ^ 5 +P iV+1 ^ 8 =0, 

(2.2.C) P^-^-P^.^+P^^O, 

(2.2.d) P^ 3 -P*+i^2+P* + 2Ai=0, k=N-S, . . ., 1, 0, 

(2.2.e) -Po^o+P^^O. 

If (2.2) is augmented by 

(2-3) P. 1 A l =P (A a -Ao)=PoH 

and (2.3) is added to (2.2.e), the following system of equations results: 
(2-4-a) P^^-P^+i^^O, 


(2.4.b) P^As-PvAs+PwA^O, 

(2.4.C) P^_ 2 ^3-^-1^4 + P^7 = 0, 

(2.4.d) P k A 3 -P k+1 A 2 +P k+2 Ay=0, k=N-3, . . ., 1, 0, -1, 

(2.4.e) P_ X A 3 =P H. 

When (2.4. c), (2.4. d), and (2.4. e) are multiplied from the right by A z ~ x which exists since p, <7>0, 
the following system of equations results, 

(2.5.a) P N A 10 -P N+l A 6 =0, 

(2.5.b) P^Av-PvAs+P^A^O, 

(2.5.c) Pk-2-Pk-iCi+PvC^O, 

(2.5.d) Pt-Pt+iCz+PwC^O, k=N-3, .... 1, 0, -1, 

(2.5.e) P.x-Potfo 

Where Ci=A x A z -\ C 2 =A 2 A 3 ~\ C i =A A A z ~\ C^A-jA^ and H C =HA Z ~\ 


Let X H ={P n , P H -i), n=0, 1, . . ., N—l, be a 2M+4 dimensional row vector, X N =(P N , P^-i) 
a 2M+3 dimensional vector and X K+ i=(P tf+ i, P N ) a 2M+2 dimensional vector. Define a 
(2iW+4)X(2M+4) matrix B, where 


It follows from equation (2.5. d) that 

(3.1) X n .i=X H B, n=N-l, . . ., 2, 1. 

Using simple iteration on (3.1) 

(3.2) X^X^B"-*- 1 , n=N-l, . . ., 1, 0. 

Thus (3.2) determines the vector X n , n=N—l, . . ., 1, 0, in terms of the elements of X N -i. i 
Equations (2.5.a), (2.5.b) and (2.5.c) relate P N - 2 , P N - U P N +\. When (2.5.a) is written out explicitly \ 
one gets 

(3.3) n 2 p Ni t=nip N+ i, t-i, 1=1, 2, . . ., M. 

Thus the component of P N can be written in terms of the components of P N+ \ and p M0 . When (2.5.b) 
is written out explicitly one gets 

Pjv-i.oH-M2Pat-i.i- MiPAr.o+?Mi2>Ar+i.o=0, 

PaT-!.<+M2 2?AT- 1.1+1 — (A*l + M2)PAT.<+PMlPA'+l.<-l+2MlPA'+l.i=0| i = 1 » 2, . . ., M—l, 

PN-\.M+»2PN-l.M+l — (l*l-\-H2)Plf.M + PHlPN+l.M-l = 0- 

By using (3.3) it is possible to solve this system of equations in terms of the components of P N +\, 
Pn.o and^Ar-io- Then by using (2.5.c) one obtains P N - 2 in terms of the components of P N+U p N .o and 

m/m/i queue with delayed feedback 


Pif-i.o- Thus X n can be written uniquely in terms of the components of V N+U p N0 and p N -i. , for 
n=0, 1, . . ., N-\-l. Therefore P n can be written uniquely in terms of the components of P N+i , p N _ 
andp B -i.o> and by using (2.1.c) and (2.5.e) P n can be determined uniquely forn=0, 1, . . ., N-\-l. 

The major problem is to determined B N ~ n ~ 1 explicitly. A similarity transformation on B as 
B=RJR~ l is sought. In Section 4 it well be shown that B has 2M+4 distinct eigenvalues two of 
which are and 1. Thus J is diagonal. In Section 5 it is shown that both R and R~ l can be written 
out explicity, and finally in Section 6 the boundary conditions are fitted. 


In this section the eigenvalues of B well be found. 

LEMMA 4.1 : The invariant polynomials of B that are not identically 1 are those of the z- 
matrix z 2 I—zC 2 +C 3 . 

PROOF: The characteristic matrix for B is 

B(z)=(zl-B) = 

zi a 

-I zI-C 2 




ro -/I r/ ei-c»i 

and T(z)= 
U zi ] |_0 I ] 

S(z) and T(z) were determined in order to have 

S{z)B{z)T(z)= \ 1 

|_0 z 2 I-zC 2 +Cx] 

As S(z)B(z)T(z) is an equivalence transformation of B(z) and since equivalent matrices have the 
same invariant polynomials, the invariant polynomials of B(z), that are not identically 1 are 
among those of the matrix z 2 I—zC 2 -\-Ci. By performing row and column operations, z 2 l—zC 2 -\-C\ 
can be written in the following form. 

"a e 

d b e 

d b e 

z 2 i-zc 2 +a-- 

\^y d b e 
d c 


which is (M+2) X (M+2) dimensional 

a=z 2 — (X+Mi)aA+Mx2/X, 

6 = 3 2 — (X+Ml+M2)2/X+ Ml 2/X, 
C=Z 2 — (\+fl 2 )z/\, 

d=n 2 z 2 /\, 

Let Z> n , and n Xft determinant, be defined recursively by 

A = 2(2-l), 

D 2 =z(z-l)(z 2 -(\+n 1 +n 2 )z/\+qJ\+q, Ui2 z 2 /\ 2 ), 
Z? n =(2 2 -(X+Mi4-M2)2/X+2^ 1 /X)Z> re _ 1 -(^ lM2 2 2 /X 2 )O n -2, n=3, 4, . . ., M+2. 

LEMMA 4.2: 

D M+2 =\z 2 I-zC 2 +C,\. 

PROOF: This is proved by performing elementary row and column operations on D M+2 . 

K n is defined to be K n (z)=D n /z(z — 1). 

The z argument will be omitted when not necessary. Clearly; 

K;,=(2 2 -(X+Mi+M2)2/X+2Mi/X)K tt _,-(pwM22 2 A 2 )X«-2 ) n=3, 4, . . ., M+2, 

K 2 = (Z 2 — (X+Mi+M2)z/X + 2M1M2/X 2 ) , 


K, = \. 

The following values of K n (z) will be useful later. We list them here for reference and call 
them formula (4.2). 

(a) K„(l) = ((-l)' l+1 (pMi/X) n /(M2-pMi))((M2-pX/2) + (^/2-M.p)(M2/pMi) n ) ) n=\, 2, . . . ,M+2, 
whenpjui5^fi 2 - 
(4.2) (b) ^ n (l) = (-l)"- 1 (pMiA) n - 1 (2MiA+^(l-2MiA)), n=l, 2, . . ., M+2 whenp M i=M 2 . 
(c) K rt ( 2M iA) = (-l) n - 1 (^Mi 2 A 2 ) n - 1 , »=1, 2, . . ., M+2. 

In order to show, that K M+2 has 2M+2 distinct roots, which implies that D M+2 has 2M+4 
distinct roots, it has to be shown how2f n (1) changes sign as a function of n. Due to the complex- 
ity of K n (1) several cases have to be analyzed. Let 

K=\(pn 1 l\) n {plq)l{n 2 -'pixi)\, a=(in—\p/q), b = (\p/q—mp) 

c=(n 2 /pm), d=qm/\) and e = (l—qm/\). 

m/m/i queue with delayed feedback 
The cases, whenp/xi^M2 are shown in Table 1. 

Table 1. The form of K n {l), whenpni^^ 



The form of K n (l) 





'-l)" +1 M + |a| + |6|c») 





'-l)»+ifc n ( + |a|-|&|c») 





-l)-+»A.(-|o| + |6|c-) 





-l)"+'/ f „(-|a|-|b|c») 




5 ( 

-l)"+»M + |o| + |6|e») 





-l)«+^ n ( + |a|-|6|c») 





-l)»+*fc,(-|a| + |6|e») 










-l)" +i h n \b\c 





' - l)" +i h n \b\c 





:-l)« +i K\a\ 





'-l) n+i h„\a\ 




Whenp/Li]— ju 2 =a+6>0, c>l. Thus cases 4 and 5 are not possible. The cases when 2>mi = M2 are shown 
in Table 2. Let d=qnj\ and e=(l— ?miA)- 

Table 2. The form of K n (1), when 


case The form of K n (l) 

1 (-l)- +1 (PMi/X)"- , (+d + ne) >0 >0 

2 (-l)« +I (pMi/X) n_1 ( + d-we) >0 <0 

3 (-l)«+Hprt/X) n - 1 (+d) >0 =0 

As the analyses of all cases are similar, the analysis for case 1, when p\t.^\i. 2 will be shown. 
Results for all the others follow a similar pattern. As b=\pjq— juip>0, it is clear, that <?/x 1 /\<l. 

LEMMA 4.3: K 2 (z) has two roots a t and a 2 , one of which lies in [0, gMiM] and the other in 
[1, •]. 

PROOF: The proof is easy since K 2 (z) is a quadratic function of z. 

LEMMA: 4.4: K 3 {z) has four roots. Two lie in [0, 2MiA] and two lie in [1, «]. The four roots 
are distinct. 

PROOF: By the definition of K 3 (z), formula (4.1), it is clear that #3(0) >0 and K a (cn)<0. From 
property (4.2.c), K 3 (qni/\)^>0. Thus K 3 {z) has at least two roots in [0, qni/\] and <*i separates two of 
them. From property (4.2. a) and the definition of K 3 (z), K 3 (1)>0, K 3 (a 2 )<0. Furthermore the 
definition of K 3 (z), implies K 3 (z)—kb for 2— >°o. Thus ^3(2) has at least two roots in [1, °°) and 
a 2 separates two of them. Since ^3(2) has only four roots, two lie in [0, ?MiA] and two lie in [1, °°) 
and the four roots are distinct. 



THEOREM 4.5: K n (z) has 2n— 2 distinct roots 71, y 2 , . . .,72n-2, n— 1 of which is in [0, 2miA] and 
n— 1 of which are in [1, °°). If vi, V2, • ■ -, V2 P -i are the roots of K p - X {z) and</>i, <f> 2 , . . . , 4> 2 „-4, 4>2p-3, 
<^ 2 p-2 are the roots of K p (z), one has 



< C'?2p-4 - \^2p-2 < \ 00 

PROOF: This theorem is proved by induction from Lemma 4.4. 

Using the same technique it is easy to show for all the other cases that K MJ . 2 has 2M+2 distinct 

roots, y u and 





72M + 1 



To find the right hand eigenvectors of B one looks at 

(5.1) (B-yI)x=0. 


(5.2) y(B-yI)=0. 

First the right hand eigenvectors will be found. By performing row operations on (B^yl) 
system of equations. (5.1) reduces to the following system to equations. 




X +(c—y)X M + 2 =0, 

Xi— Gij/X) (d/c)yx M+2 + (d—y)x M+3 + (mj/X) (d/c)x =0, 

x i —(n 2 /\)yx M+i+1 +(d—y)x M+t+2 +(iJ.2l\)x i -i=0, i=2,3, . . ., 

Xjf+i— WX) (e/d)yx 2M +2+ (e—y)x 2M+3 + WX) te/d)x u =0, 

(a/c)yx M+2 +bx M+z —(a/c—y)x =0, 

(a/d)yx M+t +bx M+i+l — (a/d—y)Xi- 2 =0, i=3, . . ., M+2, 


m/m/i queue with delayed feedback 289 


a=(ti l /\)(q—p f i 2 /\), d=(\+fi l + f j. 2 )/\, 

b=pm/\, e=(\+n 2 )/\. 

c=(X+ M ,)/X, 

The solution can be obtained using standard difference equation techniques by using equation 
5.3-5.5 and 5.7-5.9. Then 5.6 is satisfied if and only if y is an eigenvalue. 

To find the left hand eigenvectors column operations are performed on (B^yl) . Then the sys- 
tem of equations reduces to the following system of equations. 

(5.10) yy^yu+t+2, i=0, 1, . . ., M+l, 

(5.11) (c—y)yM+2—(nA)yyM+3—(a+»2b/\)yo=0, 

(5.12) (d-y)y M +t+2— WX^+.+s— fy/<-i— (a+n 2 b/\)yi=0, i=l,2, . . ., M, 

(5.13) (e— y)y 2 M+3— by M =0. 

where a, b, c, d and e are the same as before. The solution can be obtained using standard difference 
equation techniques. By using equations (5.10)-(5.12), and then (5.13) is satisfied if and only if 
7 is an eigenvalue. 


It has already been shown that 

(6.1) X =X N - 1 B N - 1 =X„- 1 RJ"~ 1 R-\ 
when the definition of X t is used, (6.1) turns out to be equivalent to 

(6.2) (Po,P_0=(iVi,iV2) RJ N -'R~ l . 

P N -\ and P N - 2 can be written in terms of the components of P N +), p N , and p N -i, Q . (6.2) contains 
2M+4 scalar equations. The variables in those equations are p N+ i, , . . ., p N+ \, M -i, Pn,o,Pn-i,o>Po,q, 
• • -,Po.m+i andp_ 1-0 , . . ., p-i.M+i. (6.2) can be solved for p , , . . ., p ,M+i and^_!, , . . ., P-i,m+i in 
terms of p N+ \,o, • • -, p N +i, M -i, Pn.o &ndp N -i. . From (2.5.e) one obtains p-i, , ■ ■ -, p~\,m+\ as linear 
functions of p 0i0 , . . ., p .M+i- When the results of (6.2) are substituted into (2.5.e) one obtains M-\-2 
linear equations, M-f 1 of which are independent. These equations determine ^+1,0, • • •. Pn+i.m-u 
p N , and p N -\. within a multiplicative constant. Then by using (2.1.c) 

the normalization constant can be determined. Thus the probability vector P has been determined 

THEOREM 6.1: The finite feedback problem with blocking, defined by (2.1), (2.5) and the 
corresponding matrices defined in section 2, has a unique probability solution (X n ) 

X^=X N ^B N - n -\ n=N—l, . . ., 1, 0, 

where B=RJR~ 1 whose terms have been determined in section 5. Furthermore X =(P , F_i) sat- 
isfies (6.2) which together with (2, 5. e) and (2, 1. c) determines Z^_, uniquely. 
PROOF: The proof is included in Section 4, 5, and 6. 






In this section an example is given. It is assumed that iV=3 and M=l, X=l, mi=4, ji 2 = 
p=q=l/2. The B matrix is found to be 






2 -2 

12 -6 

Then it follows that 









8. 5456 

The R matrix is found to be 


1. 6667 



-4. 2612 


-2. 3333 



-5. 9675 

-. 2921 


-. 3333 


-. 1035 

1. 1378 





3. 8112 

-. 2098 


-. 1950 


5. 2796 



. 6345 
. 7835 J 

and the R~ l matrix 




. 9678 2. 4940 


. 2767 . 7131 

9. 6589 

9. 6589 9. 6589 

9. 6589 

9. 6589 9. 6589 


.6411 . 5855 


. 8044 . 7347 

-2.8462 -.2092 .1170 -11.1390 -.8187 .4579 
.5314 .1175 .0050 4.5414 -1.0043 .0424. 

m/m/i queue with delayed feedback 
Within numerical accuracy the probability distribution is found to be : 

Table 3. The steady state probabilities when iV=3and M=\. 






0. 2851 

0. 1430 

0. 0448 

0. 4729 


0. 1444 

0. 0677 

0. 0450 

0. 2571 

i 2 

0. 0755 

0. 0451 

0. 0525 

0. 1731 


0. 0489 

0. 0300 

0. 0789 


0. 0150 


0. 0150 


0. 5689 

0. 2858 


From Table 3 one obtains the probability that server I is blocked as 0.1423 and the probability 
that server II is blocked as 0.0150. 


This work was completed while the second author was Distinguished Visiting Professor at the 
Ohio State University. He would like to thank the faculty, staff and students of that universit}^ 
for their help in preparing this paper. 


[1] Arya, K. L., "Systems of Two Servers in Bi-Series with a Serial Service Channel and Phase 
Type Service," Zeitschrift fur Operations Research, B4, 115 (1970). 

[2] Hannibalsson, I., "Networks of Queues with Delayed Feedback," Technical Report 75-10, 
Department of Industrial and Operations Engineering, University of Michigan (June 1975). 

[3] Jackson, J. R., "Networks of Waiting Lines," Operations Research, 5, 518 (1957). 

[4] Jackson, J. R., "Job-Shop LikeQueueing Systems," Management Science, 10, 131 (1963). 

[5] Maggu, P. L., "Phase Type Service Systems with Two Servers in Bi-Series," Journal of Opera- 
tions Research Society of Japan, 4%, 505 (1962). 


Wayne Winston 

Indiana University 
Bloomingtm, Indiana 


We consider a queuing system in which both customers and servers may be 
of several types. The distribution of a customer's service time is assumed to depend 
on both the customer's type and tie type of server to which he is assigned. For a 
model with two servers and two customer types, conditions are presented which 
ensure that the discounted number of service completions is maximized by assigning 
customers with longer service times U faste r servers. Generalizations to more 
complex models are discussed. 


For reasons of analytic simplicity, most modelers of congested s} r stems assume that all servers 
are identical. There are many situations in which such an assumption is unrealistic. For example, 
in a hospital the assumption of homogeneous ser s equates a semi-private room with a coronary 
care unit. In a supermarket, the assumptio r jf identical servers ignores the express lane and the 
different rates at which cashiers work. A queuing system in which both servers and customers 
are of several types will be called a heterogeneous quexung system. In a heterogeneous queuing sys- 
tem the method used to assign customers to sen ers ; . an important aspect of the system's opera- 
tion. Let the state of the systen. be denned by knowledge of the type of customer (if any) occupying 
each server. Most prior work on the assignment of customers in heterogeneous congestion systems 
assumes that customers are assigned to servers according to rules that are independent of the 
state of the system, (c. f. Kotiali and Slater [6] and Rolfe [8].) In this paper the assignment rules 
under consideration will depend on the system stute. 


Consider a queuing system consisting of <<? 1 3r\ ers. Customers of type i (*=1, 2, . . ., r) arrive 
at rates X f according to independent Poisson processes. Upon arrival, a customer must be assigned 
to an idle server; if all servers are occupied, then an arrival is lost to the system. A type i customer 
who is assigned to server j completes service according to an exponential distribution with param- 
eter n i} . A unit reward is earned whenever a customer completes service. Rewards are assumed 


294 , W. WINSTON 

to be discounted by a factor a, so a unit reward earned at time t is equivalent to a reward of e'"' 
earned at time 0. The goal is to assign customers to servers so as to maximize the expected dis- 
counted reward earned over an infinite horizon. 

As an example of a situation where the above model may be applicable let the servers be fire 
engines and the customers be fire alarms. Then a fire engine must be dispatched whenever an alarm 
is recorded. An alarm that is recorded when no fire engine is available is considered to be lost to the 
system because the chance of controlling a fire is assumed to be nil if an engine is not immediately 
dispatched. In this context, the m</s would depend on the location of the alarms and fire engines. 
For further work on the problem of dispatching fire engines the reader is referred to [1]. 

It is clear that the problem of determining an optimal assignment policy may be formulated 
as a continuous time Markov decision process (CTMDP); see Howard [4]. To do so define 
7 r = {0, 1, . . ., r} and let (I r )' be the s-fold Carterian product of I r . Finally, for C=(n u n 2 , . . ., n,) e 
(I T ) , \etX(C)={i\n i =0}. Then the relevant CTMDP is in state C=(n u . . ., n s ) e (I T ) S whenever 
server i is occupied by a type n t customer (n<=0 indicating that server i is unoccupied). For any 
state C having X(C) ^ { } the set of possible actions is F c , the set of all functions mapping /,— - {0}— > 
X(C) ; for all C such that X(C)= { } the only action is a dummy action, labeled s+1, which has no 
effect on the process. When action s+1 is chosen all arrivals are turned away and are lost to the 
system. If an action/ e F c such that/(i)=A; is chosen then any type i arrival who finds the system 
in state C will be assigned to server k. 

We now assume that the n tj 's satisfy the following conditions 

(1) nu+Hjk^Htk+nji, j>i, l>k 

(2) m#>m<*, j>i 

(3) M<j>Mu ^>k 

Condition (2) implies that lower numbered customer types have longer expected service times while 
condition (3) implies that higher numbered servers are faster. 

Let ir a be a stationary policy which is optimal when the discount factor is a. In [9] it was con- 
jectured that conditions (1) — (3) ensure that the action f c chosen by ir a whenever the state is C 

(4) Jc{i)>fc{i+l) C*(I r y,l<i<r-l. 

By (2) and (3), a policy that satisfies (4) assigns customers with longer service times to faster 
servers; such a policy will be called a longest in fastest (LIF) policy. In [9] it was shown that a LIF 
policy is optimal for a two period discrete time version of the above model. We content ourselves 
with proving the optimality of LIF for the case r=s=2. 


IfwewriteA=(0, 0),5=(0, 1), C=(0,2), D=(1,0), E=(2,0), F=(l,2), G=--(l, 1), H=(2, I), 
and 7=(2, 2), then for r=s=2 the CTMDP associated with r=s=2 (herefter abbreviated as CT) 


has a state space £= {^4, B, C, D, E, F, G, H, I}. We let/ s * denote the k'th possible action in state s. 
Then the set of possible actions may be written as 

Actions / A1 (l)=2,/ A1 (2)=2, 


j M {l)=2,j M {2) = \, 

/«i(1)=/bi(2)=/ c1 (1)=/ C 2(2)=1, 

/«(l)=/«(2)=y«(l)=/*(2)=2 > 

y^i(l)-/Fi(2)=/ G1 (l)=/ G1 (2)=/ ffl (l)=/ ffl (2) 
=// 1 (l)=//2(l)=3. 

When action £ is chosen in state i the transition rate from i to j will be written as a i} k . The transi- 
tion rates for CT may then be written as 

a AB = Xl| GE^c = A 2) &.AD = Al, a AE = A2, #jlC =^2) a AD = Aj, 
^B = Al, a X£ =^2> a BA =Ml2) &BG =Aj, &BH = A2, flc.l = M22> 
ac/^ = Alj Op/ =A2, ttflA = Mllj &£>F — Ag, (ZflG =X]| Gt#x =M21> 
a £ff =Ai, ##/ =X2, Q> FC = Mll) a M> = M22> a CB —Mil) a GD = Ml2) 
dffB =M21, &HE = Ml2> tt/c = MS1| a /£ = M22- 

Defining qf to be the rate at which rewards are earned when the state is i and action k is 
chosen we have 

g , X 1 = 2A 2 = 2A 3 =2/ = 0, 2b 1= M12, 2C 1 = M22, gD^Mll, 2B=M21j 
2f 1 = MH + M22, ?o 1 = Mii + Mi2, 2s; 1 =M2)+Mi2. 2/ 1= M21+M22- 

We now assume that (1) — (3) are valid for the case r=s=2; that is, for *=1, 2, 

(5) Mi2+M2i>Mn+M22> 

(6) M2i>Mli» 


(7) Mi2>Mil- 

For the present problem the only times at which a non-trivial decision can be made are when 
an arrival finds both servers idle. Our goal is to prove the following result: 

THEOREM 1 : If (5)-(7) are valid, then for any discount factor a, the expected discounted 
reward earned over an infinite horizon is maximized by a policy that always chooses action 1 or 4 
in state A. 


As desired, this result shows that (5)- (7) ensure that an LIF policy is optimal for the case 

To prove Theorem 1 we will consider a discrete time Markov decision process, DT, that is 
computationally equivalent to CT. By computationally equivalent, we mean that CT and DT have 
identical state and action spaces and for any discount factor a there exists a discount factor 0, 
such that the a-optimal stationary policies for CT are identical to the ^-optimal stationary policies 
for DT. Let CT be described by a state space S, action spaces A u transition rates {a t *} and reward 
rates {q k }. If rewards in CT are discounted by a it follows from page 121 of [4] that for 

iV> max \a t k \ 


CT is computationally equivalent to the discrete time Markov decision process described by the 
following : 

State Space S 

Action Spaces A t itS 

Transition Probabilities P t j k =S tj -\-a t k /N, i, j*S; keA t 


where S^ .^ 

Discount Factor 0=N/(N+a) 

Rewards ri k =0q ( k /N ieS, keA t . 

We note that the use of a discrete time Markov decision process to gain insight into the struc- 
ture of an optimal policy for a continuous time Markov decision process has recently been used 
(with great success) by Lippman [7]. If we choose the N in (8) to be the A denned on page 690 of 
[7], then a discrete time Markov decision process identical to the one considered by Lippman is 
obtained. For our purposes we define DT to be the version of (8) associated with 

A7 = Xi + X2 + Mll+M22 + M21 + K 


Then the proof of Theorem 1 is equivalent to 

THEOREM 1': If (5)-(7) hold, then action 1 or action 4 is optimal in state A of DT. 

Before proving Theorem 1 we need to consider the problem of operating DT for n< °° periods 
so as to maximize expected discounted reward. A straight-forward inductive argument (see Theorem 
2.1 of [3]) shows that, among randomized rules that depend on the past history of the system, the 
expected discounted reward earned during n periods is maximized by a rule that depends only on 
the present state of DT and the number of periods for which DT is to be operated. Let R be the 
class of all such rules. A typical rtR may be written as r=(r\ r 2 , . . .) where r*: {A}— >{1, 2, 3, 4}. 
If r*(A)=j, then action j is chosen if DT is in state A and is to be operated for k periods; in all 
other states, there is no freedom to choose an action so action 1 is always taken. Let F n [r\(i, j)] be 
the expected discounted reward earned in operating DT for n periods when rule r is followed and 


the initial state is (i, j). The above remarks imply that there exists a r=(r 1 , r 2 , . . .) e R with 
the property that for all (i, j) and n>\ 

F n [r\(i,j)]>F n [r\(hJ)] rjt. 


n— 0, 

A=(\i+\ a )/N, A^XJN, A 2 =\ 2 /N and 

Then for i, j=l, 2, 

(9a) F n (i, O^pSn+PSnFn-iiQ, O)+0A 1 F n _ l (i, l)+(SA 2 F n S, 2) +0(1-2- S (l )F n ^(i, 0), 

(9b) F n (0, t)=/SS«+/9fif«F._ 1 (0i 0)+pA l F n . 1 {l, i)+pA 3 F H . l (2, i)+0(l-A-S (2 )F n _ 1 (O, i), 

(9c) F n (i, j)=fiSn+0S„+pS a F n -i(O, fi+pSjzF^d, O)+0(l-S il -S j2 )F n - l (i, j), and 

, lx „ , N fMi^»-i(l, 0)+Ma max [F n _ 1 (2, 0), 7^(0, 2)]+0(l-4)F n _ 1 (O, 0)=<?„(1) 

(9d) F n (0,0)= max _ 

lMiF»_.,(0, l)+/fci 3 max [if n . 1 (2 J 0), ^(0, 2)]+0(l-,4)* n _,(O, 0)=£„(2) 

Our development will require the following properties of F n (i, j) 

LEMMA 1: For n>\, F n (0, l)>F n (l, 0). 

LEMMA2: Forn>l, F*®, 1)>F«(1, 2). 

LEMMA 3: Forn>l, F n (0, l)-F n {l, 0)>F n (0, 2)-F n (2, 0). 

LEMMA 4: Forn>l, 

(10) l+F n (0,2)>F n (l,2), 

(11) l+F n (l,0)>F n (l,l), 

(12) 1+^,(0, 0)>F n (0 f l), 

(13) l+F n (2,0)>F n (2, 1), 

(14) 1+F,(0, 1)>F B (1,1), 

(15) 1+^.(0, 0)>F„(1,0). 
LEMMA 5. Forn>l, 

(16) F s (l,2)>F n (l, 1), 

(17) F„(0,2)>^(0, 1), 


(18) F B (2,2)>F n (2,l). 


Lemmas 1-5 will be proven oy mduction. The trivial verification of Lemmas 1-5 for n~ 1 is 
omitted. To complete the inductive proof we assume the Lemmas are valid for w— 1 and verify 
them for n. 


F„(0, l)=Mi^»-i(l, l)+0A 2 F n ^(2, l)+S 12 [0+l3F n _ 1 (O, O)]+0(l-A-S 12 )F n ^(O, 1) [by 9b)] 
>^ 1 F B _,(1 ) 1)+^ 2 F„_ 1 (1, 2)+S ia \fi+f3F n - 1 (0, 0)]+p(l-A-Si 2 )F n - 1 (l, 0) 

(by Lemmas 1 and 2 of induction hypothesis) 

>F re (l,0). (by (7), (9a), and (15)) 

PROOF OF LEMMA 2: Substitution of (9c) shows that Lemma 2 is equivalent to 

(19) Sulfi+fiFn-iQ, 0)]+S 21 \fJ+pF n - 1 (0, l)]+(3(l-S u -S 2l )F n ^(2, 1) 

>&i[/3+/32? B _ 1 (0 J 2))+S n [p+pF n -dh O)]+0(l-Sn-S 22 )^„-i(l, 2). 
By Lemmas 2 and 3 of the induction hypothesis, (19) is valid if 

(20) fiiBu+Sv-JSn-SriFn-td, 2) + (S l2 -S u )[0+pF n _ i (O, 2)] 

+ (S 12 -S 22 )[0+(3F n _ l (l, O)]+(S 2l -S 12 )[0+pF„^(O, 1)]>0. 

By (6) and (7) plus Lemma 1 and (17) of the induction hypothesis (20) is valid if 

/3(Si 2 +S 21 -Su-S 2 2)[l+^n-i(0, 2)-F„_ 1 (l', 2)]>0. 

The last inequality holds by (5) and (10) of the induction hypothesis. 

PROOF OF LEMMA 3 : Substitution of (9a) and (9b) shows that Lemma 3 is equivalent to 

(21) /SdiF-id, l)+0A 2 F n ^(2, l)+S i2 W+0F n . 1 (O, 0)]+{Hl-A-S la )F n - 1 (P, 1) 

-{Mitf-iCl, 1)+M 2 ^-,(1, 2)+S,i[0+0F B _ 1 (O > 0)]+(3(l-A-Su)F n -.dh 0)} 
>0A 1 F n _ i (l > 2)+(3A 2 F n . 1 (2, 2)+5 , 22 [/3+ j 8F n _,(0, 0)}+l3(l-A-S 22 )F n ^(0, 2) 

-{/aA,F n _ 1 (2 > \)+&A 2 F n - x {2, 2)+&,[|3+|8F > ,_ I (0, 0)]+/3(l-3-S r 21 )F B _i(2 1 0)}. 

By Lemmas 2 and 3 of the induction hypothesis, (21) will hold if 
^S aa -S ai )F n ^(2, 0)+t3(S 22 -S 12 )F n - 1 (0, l)+p(S n -S 22 )F n ^(l, 0) 

+ (S»+S al -S ll -S 2 2)[P+PF H - l (0, 0)]>0. 

The last inequality is valid by (5) — (7) plus Lemma i, Lemma 3, (15), and (17) of the induction 

PROOF OF LEMMA 4 : In the interests of brevity only the proof of (10) is given; the proofs 
of (11) — (15) are similar. Substitution of (9b) and (9c) into (10) shows that (10) is equivalent to 

(22) 1-/3(1 -Jb + lPAiFn-ril, 2)+(3A 2 F„_ 1 (2, 2)+S 22 [2/3+/3F n _,(0, 0)] 

+fi(l-AS aa )[l+F n . l (0, 2)]>/3lF„_ 1 (l, 2)+S , 11 [/3+/3F B _ I (0 > 2)] 

+S 22 [(3+$F v _ 1 (h O)]+0a-A-Su-S m )F„-A, 2). 


By (10), (18), and Lemma 2 of the induction hypothesis, (22) will hold if 

l-/9(l-3)+S 22 [2/3+|8F n _ 1 (0, O)]>S 22 [0+0F n -r(l, 0)]. 

Since /3<1, the last inequality holds by (15) of the induction hypothesis. 

PROOF OF LEMMA 5. In the interests of brevity only the proof of (16) is given; the proofs 
of (17) and (18) are similar. 

Substitution of (9b) and (9c) into (16) shows that (16) is equivalent to 

(23) S„[j8+|8F fI -i(0 > 2)]+iS , 28 [/3+/3F._ 1 (l, 0)]+^l-S n -S 22 )F n ^(l, 2) 

>S'„[(8+/3F B _ 1 (0, l)]-\-S ia [fi+$F n . l (l, O)]+0(l~Sn-tf u )*U(l, 1). 

By (16) and (17) of the induction hypothesis, (23) will hold if 

(S 22 -S 12 )[^0F n ^(l, 0)}+f*a-Sn-S 22 )F n ^(l, l)>j8(l-&i--& 2 )F n - 1 (l J 1) 


(S 22 -S 12 )W+0F n -r(l, ty-pF^l, 1)]>0. 

The last inquality holds by (6) and (11) of the induction hypothesis. 

The following result will enable us to give an easy proof of Theorem 1'. 

LEMMA 6: For n>l, r n (A)t{l, 4} 

PROOF: By (9d) the result follows immediately from Lemma 1. 

We now give theproof of Theorem 1 ' (and therefore Theorem 1 as well) . 

PROOF OF THEOREM 1': Lemma 6 and a well-known turnpike theorem (see Theorem 7.10 
of [2]) imply that if DT is operated for an infinite number of periods then it is optimal to take 
action 1 or 4 whenever the state is A. This, of course, is the desired result. 

We conclude this section by giving a heuristic interpretation of the optimality of LIF policies 
when conditions (5)-(7) are satisfied. Suppose (5) is satisfied with equality; that is, n 22 — M2i = Mi2 — mh- 
Then the optimality of a LIF policy can be interpreted as follows: given that both customer types 
gain equally from assignment to the faster server, it is more important to rid the system of customers 
with longer service times. With more customer and server types, however, the problem of proving 
or disproving the optimality of LIF policies becomes much more difficult. If customers can be 
switched between servers, however, then Johansen [5] and Winston [9] have shown the optimality 
of LIF policies with respect to the criterion of maximizing, with respect to stochastic order, the 
number of customers to complete their service in any time t<^ « . 


Consider a two-server two-customer version of the previous model d in which customers who 
find both servers occupied wait until a server is available. If a customer's type becomes known 
only when he is about to enter service and customers are admitted to service on a first come, first 
served basis, then the method used to prove Theorems 1 and 1' can be utilized to yield an incredibly 
tedious proof of the optimality of an LIF policy for this model. 



This paper is based on Chapter 6 of the author's Ph. D. dissertation at Yale University. The 
author is grateful for the guidance provided by the members of his thesis committee, Matthew J. 
Sobel and Ward Whitt. The author also acknowledges the financial assistance of the United States 
Public Health Service and National Science Foundation through grants HS-00090-4 and GK 


[1] Carter, G. M., J. M. Chaiken and E. Ignall, ''Response Area for Two Emergency Units," 
Operations Research, 2, 571-594 (1972). 

[2] Denardo, E. V., Dynamic Programming: Theory and Application (Prentice-Hall, 1976). 

[3] Derman, C, Finite State Markovian Decision Processes (Academic Press, 1970). 

[4] Howard, R. A., Dynamic Programming and Markov Processes (M.I.T. Press, 1960). 

[5] Johansen, S., "Existence of an Assignment Policy Maximizing (in the Sense of Stochastic Order) 
the Output from a Heterogeneous G/M/s/s Queuing System," submitted to Operations 

[6] Kotiah, T. and N. Slater, "On Two Server Poisson Queues with Two Types of Customers," 
Operations Research,, 21, 597-603 (1973). 

[7] Lippraan, S., "Applying a New Technique in the Optimization of Exponential Queuing Sys- 
tems," Operations Research, 23, 687-710. 

[8] Rolfe, A., "The Control of a Multiple Facility, Multiple Channel Queuing System with Parallel 
Input Streams," Technical Report Number 22, Graduate School of Business, Stanford Uni- 
versity (1965). 

(9] Winston, W., "Optimal Operation of Congestion Systems with Heterogeneous Arrivals and 
Servers," Ph.D. dissertation, School of Organization and Management, Yale University (1975) . 



Markku Kallio* 

Helsinki School of Economics 
Helsinki, Finland 


Consider a standard linear programming problem and suppose that there are 
bounds available for the decision variables such that those bounds are not violated 
at an optimal solution of the problem (but they may be violated at some other 
feasible solutions of the problem). Thus, these bounds may not appear explicitly 
in the problem, but rather they may have been derived from some prior knowledge 
about an optimal solution or from the explicit constraints of the problem. 

In this paper, the bounds on variables are used to compute bounds on the 
optimal value when the problem is being solved by the simplex method. The latter 
bounds may then be used as a termination criteria for the simplex iterations for 
the purpose of finding a "sufficiently good" near optimal solution. The bounds 
proposed are such that the computational effort in evaluating them is insignificant 
compared to that involved in the simplex iterations. A numerical example is given 
to demonstrate their performance. 


Our purpose is to establish bounds on the optimal value of a linear programming problem. 
These bounds can be used as a rule for stopping when simplex iterations are carried out in order to 
find a near optimal solution. Our motivation in doing this is that a practical problem usually has a 
large number of (feasible) extreme point solutions with the objective function value relatively close to 
the optimal one. Therefore, also a large number of iterations are expected without significant 
improvement in the objective function value. We intend to avoid the computational work involved 
in these iterations. 

The bounds can be utilized when a near optimal solution is satisfactory compared to the 
optimal solution. This is usually the case for linear programming models in practice. We shall also 
point out special applications for the branch and bound method (e.g. [6]) and for Dantzig-Wolfe 
decomposition [4]. 

Of course, any feasible solution to a (maximization) problem determines a lower bound on the 
optimal value. To find upper bounds we utilize the duality theory (see e.g. [1]) or, equivalently, 
Lagrangean relaxation [5]. A restricted dual problem is solved when evaluating an upper bound. 

*This work was carried out while the author was at the European Institute for Advanced Studies in 
Management, Brussels, Belgium. 


302 - f M. KALLIO 

Essentially this requires minimization of a piecewise linear convex function. The restriction is 
chosen so that some computation which was already carried out by the simplex method can be 
utilized again. Because of these two characteristics the evaluation of an upper bound is computa- 
tionally inexpensive. 


We consider the usual linear programming problem (LP) : 

find x e R n to 

(LP.l) max ex 
(LP.2) s.t. Ax=b 
(LP.3) O^x^t, 

where A=(a } ) e R™™, a, e R™ for all j, b e R m , c=(c j ) e R n , and t e R n . Here some of the com- 
ponents of the upper bound vector may be infinite. For l={lj)^-0 and u = (uj) in R" we define the 
problem (LP (1, u)) which differs from (LP) in that (LP.3) is replaced by: 

(1) l^x^u. 

If (.) is a linear programming problem, we denote its optimal value by v (.). In this paper we shall 
assume that I and u are such that; 

(2) »(LP)=»(LP(f,w)). 

Our purpose in defining (LP (I, u)) is to find upper bounds on v(LP) by first finding dual 
suboptimal solutions for (LP(£, u)) and then applying the weak duality result (e.g. [1]) together 
with (2). For this purpose we think cf the constraints (1) to be such that O^l^u^t and w<». 
(Note that this is not necessary for the following to be valid, but otherwise we may obtain bounds 
which are infinite and, therefore, Useless). Thus, we think of (LP (I, u)) as being a restriction of 
(LP) but having the same optimal value as (LP) does. Such vectors I and u may be derived from 
(LP.2) and (LP.3) (for an example see Section 4), from (LP.3) alone, or r even from empirical 
knowledge about the problem. 


Suppose (LP) is being solved by the simplex method (possibly combined with the upper 
bounding technique [2]). Denote by z the current value of the objective function (corresponding to 
a feasible solution for (LP)), X e R™ the current price vector for constraints (LP.2) and Cj^Cj—Xa,) 
for all j. 

Thus, Cj is the reduced cost of the variable Xj if it is currently not at its upper bound. Otherwise 
c j is the price associated with the upper bound of x } . What we shall call simple bounds on v (LP) are 
given as follows : 

THEOREM 1: If I and u satisfy (2) and we denote 5= (5,) and m=(m*), where 5,smax{0, c } ) 
and — JI,=min {0, c y }, for all j, then 

(3) 2<t>(LP)=v(LP(Z, u)) <lb+8n-Zl- 


PROOF: The left hand inequality follows from the feasibility of the current solution for (LP) 
and the equality is vdid by assumption. To prove the right hand inequality, we first state the dual 
of (LP(1, u)), denoting it by (D) : 

find XeR" and 8=(8 } ), n=(nj)eR n to 

(DA) min Xb+8u—nl 

(D.2) s.t. \A+5—n>c 

(D.3) 8, M >0. 

We now verify that (X, 8, ju) is a feasible solution for (D). Thus, the right hand inequality follows 
from the weak duality theorem (e.g. [1]).|| 


We shall show how the simple bounds can be easily improved. The method used to obtain the 
improved bounds tends to avoid extra computation by utilizing some computation which has already 
been carried out by the simplex method. It will become clear that this computation would other- 
wise represent a major effort in computing the upper bound. 

We shall first consider a class of restrictions of (D) that are easy to solve (and thereby obtain 
a sub-optimal solution for (D)). Let d and g be vectors in R m and let (RD) be the problem which 
results from (D) when X is restricted to the following set : 

(4) {X\X=g+6d, 6eR}. 

For a moment we consider g and d as being arbitrarily chosen. Later on we shall specify them in a 
way which makes the evaluation of the optimal value of (RD) (the improved bound) computation- 
ally simple. 

We can now state (RD) as follows : 

find 6eR, and 5, ntR n to 

(RD.l) min K+pd+8u—nl 

(RD.2) s.t. hd+8-»>c* 

(RD.3) 5, m>0, 


(5) K=gb,p=db, 


(6) h= (h,) =dA, c*= (c*j) =c-gA. 

Let v(RD(ff)) be the optimal value of (RD) given 0. We shall find v(RD) via minimization of v(RD 
(9)). The result below shows that v(RD(6)) is a concave and piecewise linear function whose value 
and marginal value (for any fixed 0) are readily available. Thus v(RD(6)) can be minimized by 

304 M. KALLIO 

marginal analysis: find such that for 0=0 the marginal value of v(RD(d)) vanishes or changes its 

THEOREM 2: v(RD(8)) is a concave and piecewise linear function of 0. If v+(RD(B)) is 
the right-hand derivative of v(RD(6)) (with respect to 0), then the possible discontinuity points of 
v+(RD(d)) are 0i, . . ., 0„ where: 

(7) d } = 


'—c*j/h, if hj^O 
oo if hj=0, for all j. 

(8) v+(RD(ff))=p-^u i h J + S IAj, 

1*1(9) U~I(6) 


(9) v(RD(d))=K+p9+^2u ] (c* j -h } d)- S hW-hjd), 

UK») je~l(0) 


(10) I(e) = {j\e^e and hj<0 or 0,>0 and A^O}. 

PROOF: For a fixed value of 0, (RD) decomposes into a small problem 0(0)) for each j: 

find 5^, HjtR to 
min Ujdj—ljUj 
S.t. bj—Hj^tf—hjd 
8j, /x^O. 

Let dj be given by (7). Then, because O^lj-^Uj, 

\uj(c*j-hj6) \ijd(B) 
\lj(c*—hj6) otherwise. 

(11) v(j(d))-. 

Thus (because l jt Uj^0)v(j(6)) is a concave and piecewise linear function (the only possible dis- 
continuity point of its derivative being 0^). In this notation, we have: 

(12) v(RD(d))=K+p6+J2vti(fi))- 

Thus, v(RD(6)) is a concave and piecewise linear function because it is a sum of a finite 
number of such functions. Clearly, the possible discontinuity points of its (right hand) derivative 
are d u . . ., 0„. (9) follows combining (10) -(12). Then (8) follows from (9) and (10). || 

When solving (RD) we first choose 4 e{0_,|i=l, . . ., n). If v + (RD(d)) equals zero or changes 
its sign at 0=0*, then 0* is optimal. Otherwise, if v + (RD(6 { ) )<0(>0), we increase (decrease) 
from 0i to the next largest (smallest) element in {6j\j=l, . . ., n}. We continue similarly until the 
optimum is found. (Of course," alternative search techniques (for example Fibonacci search (e.g. 
[8])) can be used to find the optimal 0). Then, applying (9), we evaluate v(RD). We call this the 
improved upper bound. 


The following result together with Theorem 2 shows that, if each component of u is finite, then 
we obtain finite upper bounds on w(LP). 

THEOREM 3. If (LP) is feasible, (2) holds, and w<oo, then 

(13) -»< v (i?D)=inf»(J?I>(0))<>. 


PROOF: By assumption, (LP(1, u)) is feasible. Therefore #(£>)> — <». This implies 


We verify directly that (RD) is feasible if u<C°> (that is, (RD) has a feasible solution with the 
objective function value <C°°)- Thus, v(RD)<^ <*>. The equality follows easily from a contradic- 
tory assumption.|| 

We consider now the computational effort needed in evaluating the improved upper bound. 
First, the constants are computed according to (5) — (7). In general, the major effort here is caused 
by the large number of inner products in computing the vectors h and c*. As is the case for reduced 
costs (see [7]), this effort may not be insignificant when compared with the computational effort in 
one simplex iteration. When performing the optimality test for different values of 6, (8) is applied 
repeatedly. However, when one marginal value is known, another can be computed recursively 
from it. This yields computational savings when the simple search (described above) is applied. 
Finally, (9) is applied once to evaluate the bound. 

We shall suggest below two rules for choosing the vectors g and d so that most of the extra 
computation needed to compute h and c* is avoided. Obviously other rules can be constructed with 
the same advantage. 

RULE A: If X is the current price vector, we define d=\ and g=0. Then h=c—(c i ). and 
c*—{Cj). Note that now v(RD(6)) for = 1 is the simple upper bound of Theorem 1. Thus the im- 
proved bound is at least as good as the simple one in this case. 

RULE B: If (d', g') is the vector pair chosen at the preceding iteration (as (d, g)) and is the 
minimizer of v(RD(d)) in that iteration, we define d=\ and g=g' -\-ti~d' . Now v(RD(6)), for 0=0, 
equals the bound evaluated at the preceding iteration. Thus the sequence of bounds is monotonically 
decreasing in this case. 


We shall next give an example showing the computations needed for the simple bound and the 
improved bound when Rule A is applied. Thereafter we investigate by an other example the per- 
formance of the different bounds. Consider the following problem (LP) : 

find xeR 6 to 

max (5, 5, 6, 0, 0,0) x 


'12 110 0' 
2 2 3 10 
4 110 1 


306 M. KALLIO 

The nonnegativity constraint together with the first equality constraint requires X!^4. x 2 ^2, 
x 3 :<4 and x 4 ^4, and together with the second and third equality, respectively, ar 5 ^6 and x 8 ^4. 
Thus, we have 1=0 and u=(A, 2, 4, 4, 6, 4) defining (LP (J, v.)). 

Suppose for (LP) a simplex iteration starts with the basic feasible solution x= (0, 0, 2, 2, 0, 2). 
The corresponding price vector is X=(0, 2, 0), the vector of reduced costs (c } ) = (\, 1,0,0, —2, 0), 
and the current value of the objective function (and lower bound on the optimal value) 2=12. In 
order to compute the upper bound on the optimal value we choose (in the notation of Section 3) 
(d, g) = (\, 0) according to Rule A. Then the restricted dual problem (RD) is as follows: 

find BeR and S, neR 6 to 

min 12 6+6(4, 2, 4, 4, 6, 4) r 

s.t. (4, 4, 6, 0, 2, 0) 0+S— /i2=(5, 5, 6, 0, 0, 0) 

5, M ^0. 

In order to find the upper bound v(RD) we first compute according to (7) (0*) = (1-25, 1.25, 
1.0, <», o, co). Initially we compute, according to (8), the marginal value v + (RD(8)), say, at 
0=03=1: v + (RD(l)) = — 12. Thus the optimal value of is not less than 3 . Starting from 3 , the 
next largest element of {6 t } is 0i(=0 2 ) = 1.25. We evaluate v + (RD '(0 'i)) = 12. Thus the marginal value 
changes its sign at 0=0j. Therefore, 0i is optimal for (RD). By (9) we compute v(RD) = 15. We 
have now 12^«(LP):<15. The simple bound given by Theorem 1 is 18 ( = v(RD(l))). Thus con- 
siderable improvement is obtained by the marginal analysis approach. One can verify that fl(LP) = 

As another example, a small production allocation problem with 19 constraints (in (LP. 2)) 
and 43 variables was investigated. Based on the constraints of this problem, the bounds on the 
decision variables were easily derived. The optimal solution was found in 17 simplex iterations (on 
Phase II) and at each iteration the simple bound as well as the improved bounds according to 
Rules A and B were evaluated. 

Figure 1 shows the bounds computed at each iteration. Notice that the current upper bound 
for one particular iteration is the smallest upper bound so far evaluated. Thus, the improved 
bounds when Rules A and B are applied perform approximately equally well and far better than 
the simple bounds. 

Consider iteration 13, where the lower bound (the current solution value of (LP)) is 0.29 below 
the optimal value. The simple bound, and the improved bounds applying Rules A and B are 3.58, 
0.82 and 0.72, respectively, above the optimal value. Thus Rule A implies that further iterations 
can improve the objective function value by no more than 1.11 (the corresponding number for 
Rule B being 1.01). If this satisfies ones termination criterion, the remaining four iterations, that 
is, 23% of Phase II iterations, can be neglected. 


Linear programs. in practice usually have a bounded optimal solution and often one can easily 
find finite bounds for each variable so that these bounds are not violated at an optimal solution of 
the problem. Therefore, the bounds developed here are expected to be useful wherever a near 












\ I < 



Figure 1. An Example of Bounds 

optimal solution is satisfactory for practical purposes. Two natural applications also arise. In the 
branch and bound method (e.g. [6]), when a candidate problem is being maximized and for its 
optimal value an upper bound has been found that is less than (or equal to) the incumbent value, 
then the candidate problem can be fathomed. The same simple idea is applicable in Dantzig-Wolfe 
decomposition (e.g. [4]) when predicting whether or not a subproblem is able to create a proposal 
that would improve the objective function value in the master problem. 

In an analogous way bounds on the optimal value can be developed when generalized upper 
bounds (e.g. [3]) are known. This further suggests a special application for the transportation 


[1] Dantzig, G., Linear Programming and Extensions (Princeton University Press, Princeton, New 
Jersey, 1963). 

[2] Dantzig, G., "Upper Bounds, Secondary Constraints and Block Triangularity in Linear Pro- 
gramming," Econometrica, 23, 174-183 (1955). ^ 

[3] Dantzig, G. and R. Van Slyke, "Generalized Upper Bounding Techniques for Linear Pro- 
gramming," Journal of Computer and System Sciences, 1, 213-226 (1967). 

[4] Dantzig, G. and P. Wolfe, "Decomposition Principle for Linear Programs," Operations Re- 
search, 8, 101-111 (1960). 


308 M. KALLIO 

[5] Geoffrion, A., "Lagrangean Relaxation for Integer Programming," in Mathematical Program- 
ming Study 2: Approaches to Integer Programming, ed. M. L. Balinski (North-Holland Pub- 
lishing Co., Amsterdam, pp. 82-114 (1974). 

[6] Geoffrion, A. and R. Marsten, "Integer Programming Algorithms: A Framework and State-of- 
the-Art Survey," Management Science, 12, 456-191 (1972). 

[7] Kallio, M. and E. Porteus, "Estimating Computational Effort for Linear Programming Algo- 
rithms," Research Paper No. 239, GBS, Stanford University (1975). 

[8] Wagner, H., Principles of Operations Research with Applications to Managerial Decisions (Pren- 
tice-Hall, 1969). 


Jeff L. Kennington 

Southern Methodist University 
Dallas, Texas 


This paper presents the details for applying and specializing the work of 
Saigal [28] and Hartman and Lasdon [16] to develop a primal partitioning code for 
the multicommodity transportation problem. The emphasis of the paper is in 
presenting efficient data structure techniques for exploiting the underlying network 
structure of this class of problems. Computational experience with test problems 
whose corresponding linear programming formulation has over 400 rows and 2,000 
columns is presented. 


The Multicommodity Transportation Problem may be simply stated in terms of a distribution 
problem in which there are M suppliers (warehouses or factories), N customers (destinations), and 
K commodities. Each supplier, i—1, . . ., M, has S ik units of commodity k and each customer, 
3=1, . . ., N, demands D jk units of commodity k. Each supplier can ship units of commodity k to 
each destination at a shipping cost per unit of c ijk (unit cost for shipping commodity k from supplier 
i to customer j). Further, each arc (i,j) has capacity b i} . The objective is to determine which routes 
to use and the shipment size so that the total transportation cost of meeting demand, given supply 
and arc capacity constraints, is minimized. Mathematically this problem may be stated in terms 
ofa 1 ' ' 1 v structured linear program as follows: 

min ^ i}k c ijk x ijk -}-C^2 Jk a jk (MCTP) 

subject to 

^2iX ijk +r ik =S ik , all i and k (*-«) 

— ^iiX ijk — a jk -=— D jk , all j and k (tat+>.*) 

^ZkXi^+Sij^btj, all i and j 

Xt#t rik, a Jk , Sij>0, all i, j, and k, (X if ) 

where x ijk denotes the flow of commodity k from source i to destination ,;', r ik denotes the slack 
variable associated with the supply constraint for commodity k at source i, a jk denotes the artificial 



variable associated with the demand constraint for commodity k at destinatoin j, and s tj is a 
slack variable associated with the capacity constraint for the arc (?', j). The Greek letters ir ik , 
"fM+j.k, and \ {j , denote the dual variables associated with the supply, demand, and capacity 
constraints respectively. The cost Cof artificial variables is taken to be 2<#c 1;Jt . 

1.1 Applications 

Multicommodity network flow problems have been extensively studied because of their 
numerous applications and because of their intriguing network structure. Geoffrion and Graves [9] 
solved a large multicommodity warehouse location problem for Hunt-Wesson Foods, Inc. Multi- 
commodity models have been proposed for planning studies involving urban traffic systems (see 
Jorgensen [22] and LeBlanc [27]) and communication systems (see White [34], and Gomory and 
Hu [12]). Models for solving scheduling and routing problems have been proposed by Bellmore, 
Bennington and Lubore [2], by White and Wrathall [33], and by Swoveland [31]. A particularly 
interesting application was suggested by Clarke and Surkis [5] for assigning students to schools to 
achieve a desired ethnic composition. 

1.2 Survey of Literature 

There are two basic approaches which have been employed to develop specialized techniques 
for multicommodity network flow problems; decomposition and partitioning. Decomposition ap- 
proaches may be further characterized as price-directive or resource directive. The papers [1, 4, 6, 
8, 32, 35] are all variations of price-directive decomposition while the papers [17, 29, 31] are 
resource-directive decomposition techniques. Partitioning procedures may be found in [13, 14, 16, 
19, 28]. The only special results for the multicommodity transportation problem appeared in a 
recent paper by Evans, Jarvis, and Duke [7]. They showed that a necessary and sufficient condition 
for the constraint matrix of MCTP to be totally unimodular is that it have no more than two 
sources or two destinations. 

1.3 Direction and Motivation of Investigation 

In recent years there have been several extremely successful specializations of the primal 
simplex method for solving one-commodity network flow problems (see [11, 25, 30]). Primal simplex 
codes have been developed which are superior to the best out-of-kilter codes by a factor of at least 
nine to one. The primal codes are also superior to dual codes based on the same specializations. 

We believe that the success of these primal codes is attributable to three key factors. First, 
the primal simplex technique has a computational advantage over a dual simplex technique when 
the problems are extremely rectangular (i.e. problems with many more columns than rows). The 
computational burden for a dual pivot is on the order of the number of columns whereas the burden 
of a primal pivot is on the order of the number of rows. Since most network flow problems are 
rectangular, the primal method has proven superior in computational investigations. Secondly, 
the simplex specializations used by these codes strongly exploit the underlying network structure. 
Finally, these implementations use efficient data structure techniques originally developed by 
computer scientists. 



This success with one-commodity problems leads one to speculate that good results could 
also be obtained by extending these ideas for multicommodity problems. Unfortunately, LP bases 
for multicommodity problems do not exhibit the triangular property present in one-commodity 
problems. However, due to the underlying network structure, part of every LP basis for multi- 
commodity problems does exhibit the triangular property and can be exploited in executing the 
simplex operations. That is, a sizeable part of the current basis may be stored using Johnson's 
[21] triple labels (or some other appropriate representation) while the remainder of the basis is 
stored in the usual matrix form. Hartman and Lasdon [16] have shown that the simplex operations 
can be performed if one carries a triangular basis of size n for each commodity, where n denotes the 
number of nodes in the network, and a working basis inverse whose size need never exceed the 
number of saturated arcs (i.e., arcs whose total flow equals arc capacity). For the multicommodity 
transportation problem, the triangular part of the basis has dimension (M-\-N)K. The working 
basis inverse varies in size with a maximum size of MN. The remaining basic columns have a single 
nonzero entry and are also exploited in the simplex procedure. The purpose of this study is to 
investigate this general approach when applied to the multicommodity transportation problem. 

1.4 Notation 

The notation and conventions used in this exposition are now presented. Matrices and sets 
are denoted by upper case Latin letters. Lower case Latin letters underlined denote column vectors. 
The zero vector is denoted by 0, and an identity matrix is denoted by /. A' and b' denote the 
transposition of the matrix A and the column vector b respectively. The symbol "==" is used in 
place of the expression "is equivalent to" while "f£" is used in place of "is not equivalent to." 
$ is used to denote the empty set. 


A linear program is said to have a block diagonal structure if bjr row and column interchanges, 
its constraint matrix can be placed in the following form, 


A n 


D n 

D n+1 

If the diagonal blocks, A u . . ., A n , each consist of a single row, then these constraints are called 
GUB (generalized upper bound) constraints. The general idea of partitioning block diagonal 
structured linear programs whose blocks have more than one row was proposed independently by 
Bennett [3] and Kaul [24]. These procedures were refined and specialized for multicommodity 



network flow problems by Saigal [28]. This specialization involves carrying a working basis inverse j 
whose size need never exceed the number of saturated arcs. Hartman and Lasdon [16] developed 
efficient procedures for updating this working basis inverse. However, neither Saigal nor Hartman I 
and Lasdon discuss how this procedure may be efficiently implemented. Implementation is the 
topic of interest in this paper, and the work of Saigal and Hartman and Lasdon provides the starting J 
point for our, work. 

This section presents computational devices for storing and manipulating the problem data i 
to enhance the efficiency of a computer implementation of the primal partitioning approach. We 
have drawn freely from the work of Ellis Johnson [20, 21], Glover, Karney, and Klingman [10], 
and our own experience in solving transportation problems [25]. Our aim is to help bridge the gap 
between the algorithm and an efficient computational implementation. 

2.1 Basis Structure 

It is well known and easily proved (see [16, 26]) that by row and column interchanges, every 
LP basis for MCTP may be partitioned as follows : 

Key Columns 

Nonkey Colums 


B 1 

R l 

B K 




T t 

T K 


s K 

u t 

u K 


where det(B k )?*0 for k=l, . . ., K. That is, B t is a basis for the following transportation problem: 

st. 2 Xvi+ru=Su, all i 

— S x i}l —a n =—D jU all j 


x w >0. 

The definitions which follow are used to characterize a basis of a transportation problem in 
terms of a graph. The motivation is to use a set of graphs (one for each commodity) to efficiently 
carry out the simplex operations involving B u . . ., B K . The notation is identical to that used by 
Johnson [20]. 



A graph 6? is a finite set V of vertices (nodes) v h . . ., v k , and a set E of unordered pairs of 
vertices, e p ={v u v } ), called edges, (arcs). A path in a graph is a sequence of vertices and distinct 
edges (vi, e u v 2 , e 2 , . . ., 0*_i, e*-i f v k ), such that6i= (#<,?; i+ i). A sim^/cpa^ is a path with distinct vertices, 
and a cycle is a simple path together with an edge from the beginning to the end of the path. A 
connected graph has at least one path between every pair of vertices. A connected graph with no 
cycles is called a tree and a graph consisting of one or more unconnected trees is called a forest. 
A spanning subgraph of 6? is a graph with the same vertex set as G, and a spanning forest of G is 
a forest which is a spanning subgraph. 

Let^4 denote the constraint matrix of TP X . Then A can be partitioned as A =[A, U], where 
A is a node-arc incidence matrix and U is a diagonal matrix with ones and minus ones along the 
diagonal. The rows of A can be viewed as representing vertices while the columns represent edges 
of the graphical representation of TP t . Let F denote the graph associated with TP t . Columns of 
U represent edges incident to a single vertex and are called slack or artificial arcs corresponding 
to the slack or artificial variables. A tree with one slack or artificial arc incident to some vertex of 
the tree is called a rooted tree, and the slack or artificial arc is called the root of the tree. A forest of 
the graph corresponding to TP h with each tree having one root is called a rooted forest. Let B t 
be a basis of A . Then the subgraph generated by the basis B t called F B[ consists of all vertices of 
F, edges corresponding to columns of A in B h and roots from columns of U in B t . Johnson [20] 
has shown that B l is a basis of A if and only if F Bl is a rooted spanning forest of F. 

Since bases for TP X have a graphical characterization, this graph can be stored rather than the 
matrix. A particularly efficient storage scheme for matrices of this type involves three labels for 
each node (see [21]). We propose to use this scheme for storing each of the bases B u . . ., B K . 
This allows the revised simplex operations involving B u . . ., B K to be performed via tracing label 
through a graph, rather than by matrix multiplication. 

It will now be shown that by column operations, B* can be converted into block diagonal 
form. Let the matrix L be defined as follows: 


-B l ~ 1 B i 


—B K l R K 




Post multiplying B* by L yields 



B K 





S K 








±k PkBk H k 

U k —S k B k 1 R k 

It will now be shown that Q and D can be easily generated if the bases, B u . . . , B K , and the 
nonkey columns are known. Let x iJk be a nonbasic or nonkey column incident on nodes n l and n T \ 
in Fb< where rii is a source and n r is a destination. Let P' L denote the unique simple path in Fb h from 
rhi to the root of the tree associated with n h including the root. All arcs in the path which are tra- 
versed in reverse direction (i.e., from destination to source) are said to have reverse orientation. 
Let P' R be the unique simple path from the root of the tree associated with n T to n T including the 
root. Let 

Pl=p'l- (p'l n P'a) and p r =p' r - (P' L n P'n) . 

For the slack variable, r ik , the definition of P' L is as described above and P' R =<t>. Let a* denote 
the column of TP k corresponding to x m or r ik . The following proposition indicates a direct means 
for constructing B k -1 a*. 

PROPOSITION 1: The i th component of y*=B k - l a* is determined as follows: +1, if the arc 
corresponding to the i th column of B- K is in P L \JP R with normal orientation (i.e., from source to 
destination), — 1, if the arc corresponding to the i th > column of B k is in P L []P R with reverse orien- 
tation, 0, otherwise. A proof of Proposition 1 may be found in [25]. 

Let T={t u . . ., t q ) where ti=(r, s) implies that the I th row of Q corresponds to the constraint 
2kXrsk-\s TS =b TS . That is, T is the index set for the rows in Q. Let x r associated with commodity! 
k be some nonkey variable corresponding to the p th column of Q. Let P L and P R be defined as 
above for the nonkey variable x T . Then the p th column of Q is easily constructed using the below 



PROPOSITION 2: The i tb component of the p tb column of Q is as follows: 

1, if x r = x utk and t t =(u, v), 

1, if x r f^x uvk> t t =(u, v), and the arc corresponding to x uvk is an element of P L \JP R with reverse 

— 1, if x T ^x uvk , ti—(u, v), and the arc corresponding to x uvk is an element of P L \JP R with normal 

0, otherwise. 

PROOF: Let a* be the column of TP k corresponding to x r . Then the columns of Q associated 
with commodity k take the form T k —P k B k ~ x R k where a* is a column of R k . Suppose a* is the j %h 
column of T k —P k B k ~ l R k and is denoted by 

■TJ-P lk B k -'a*' 
TJ-P>*B*r l a* 

TJ-P k Br l a*. 

T tk i-P qh B k -'a* 

Let i e{l, . . ., q) and suppose ti=(u,v). Then the i th component of the p th column of Q corresponds 

to arc t t =(u,v) and is given by T^—PaB^a*. 

CASE 1. Suppose x r = x uvk . Then 2V=1 and P«=0'. Hence, T ik 3 — P ik B k ~ 1 a*=l. • 
CASE 2. Suppose x r ^x uvk and the arc corresponding to x uvk is in P L UP R with reverse orienta- 
tion. Suppose the arc corresponding to x mk appears in the t tb column of B k . Then 

T yt '-P«5r 1 a*=0-[0' 1 0'] 

" r 

■t % 


where B k l a* is given by Proposition 1 . 
CASE 3. Suppose x r ^x uvk and the arc corresponding to x uvk is in P L DP R with normal orienta- 
tion. Suppose the arc corresponding to x uvk is in the t th column of B k . Then 

T ik >-P ik B k - l a*=0-[Q' 1 0'] 

" r 

where B k l a* is given by Proposition 1. 
CASE 4. Suppose x r ^x uek and the arc corresponding to x uvk is not in the set P L \JP R . Suppose 
Ptic is 0'. Then clearly, the i th component of the p tb column of Q is zero. Suppose 
P ik has a 1 in the t th position. Then , 


T tk i-P ik B k - l a*=0-[0' 1 0'] 

- r 



This completes the proof of proposition 2. 


Let Z={zi, z 2 , . . ., Zu>} where Zi=(r, s) implies that the Z th row of D corresponds to the con- 
straint 2 x rsk +s T ,=b TS . That is, Z is the index set for the rows of D. Using the definitions for 


x T , P L , and P R given above, the following proposition shows how columns of D may be constructed. 

PROPOSITION 3. The i th component of the p th column of D is as follows: 
1, if x r =x uvk and z t =(u, v), 
1, if x r ^x uvk , z t =(u, v), and the arc corresponding to z uck is an element of P L \JP R with 

reverse orientation, 
— 1, if x T f£x utk , z t =(u, v), and the arc corresponding to x mk is an element of P L {JP R with 

normal orientation, 
0, otherwise. 

The proof of Proposition 3 is identical to that given for proposition 2 with the index set Z re- 
placing T. It is now shown that the results of the above propositions can be used to specialize the 
primal simplex method. 

2.2 Data Requirements 

Assume that at the beginning of an iteration, the following quantities are explicit^ stored, 
(i) B k , k=l t . . ., k, each stored using Johnson's [21] three labels, 
(ii) the q x q matrix Q~ l , 

(iii) the dual variables, v tk , ir M+jik) and \ ijt and 
(iv) the index sets T and Z. 
Using the above data it will now be shown that the revised simplex operations can be efficiently 

2.3 Pricing 

The relative cost factors are simply c ik =T ik , for slack variable r ik , Cij k —-K ik —ir M+iik -\-\ ij —c i j k , 
for variable x ijk , and c 7 j 3 =X„, for the slack variable s i} . The optimality condition is max jc,* 1 , c ijk 2 , 
Ctj 3 ]<0 for all i, j, k. Any variable with revised cost greater than zero is a candidate to enter the 

2.4 Updating a Column, Selecting the Leaving Variable, and Updating Flows 

Consider some column a of the constraints for MCTP. The updated column g is formed by 
solving the system B*g=a. Let y=L~ x g or 

(2) Ly=g. 


Then B*Ly=a can be solved for y_ and obtained from (2). By a suitable partitioning B*Ly—a is 


U l 




B K 







a B 


s K 





Note that at most one of the vectors a u . . ., a K will be nonzero. If all are 0, then a must corres- 
pond to some slack variable, s u . If one is nonzero, denote it by a,. Then the above system yields 

(3) y*=0, for k=l, . . ., k\ k^ I 

(4) y l =B l ~ l a l 

(5) yR=Q- 1 [a R -P l y l ] 

(6) y =(h—S,Bi- 1 Bi,—Dy R 

The equations (2) — (6) imply 



(9) g k = 


gR=Q~ 1 [a R -Piy l ] 
—B k ~ y R k g Rk , for k=\, . . ., If and k^l 

where g R is partitioned into [gn lt . . ., gn K ] and each partition has the same dimension as the 
corresponding R k . We are concerned with an efficient procedure for making the computations 
(7) — (10). Let x E denote the variable corresponding to a and suppose there exist an a^O. Con- 
sider the following sequence of steps. 
STEP 1. : Determine y l =B l - 1 a l . 

Proposition 1 is used to construct y t . 



STEP 2: Calculate g R =Q- l [a R -Piyi). 

If x E ^r u for some i, then a R =0. Otherwise x B = x uvl for some u and v and 
a R is either a unit vector or 0. If there exists a t p =(u, v) for some p, then a R 
is a unit vector with a one in the p th component. Otherwise, a R =0. Attention 
is now turned to the calculation of the q component vector Piy t . Pi is a qXM+N 
matrix whose elements are one or zero. The rows of Pi correspond to the index 
set T, and the columns correspond to columns of B t . Further, each row of P t 
can have at most one nonzero entry. Suppose t p =(u, v). The p tb row of P t 
will have a nonzero entry if and only if the column corresponding to x uv i appears 
in Bi. If x uvt appears in the r th column of B h then Pi will have a one in the 
(p, r) th element. Otherwise the p th row will be 0'. Hence, the vector P t yi can 
be efficiently constructed as y t is being constructed. That is, as the nonzero 
components of y t are constructed, a search over the index set T can be used to 
construct the components of P t yi. Once a R and P t yi have been constructed, 
g R is obtained by the matrix multiplication 

Q- l [a B -P,yl 

STEP 3: Calculate g =a Q -Siyi—Dg R . 

If x E =r u for some i, then a =0. Otherwise x E = x uv i for some u and v and a 
is either a unit vector or 0. If there exists a z p =(u, v) for some p, then a is a < 
unit vector with a one in the p tb component. Otherwise, a =0. Let us now j 
investigate the calculation of Siy h Note that S t has the same form as P t except 
the index set T is replaced by the index set Z. Hence Siy t can be developed 
during the construction of y t by a search over the index set Z. The components 
of D can be constructed using Proposition 3. Hence the vector Dg R is easily 
obtained once g R is determined. Finally the three vectors are combined to 
yield g . 

STEP 4: Determine g k =-B k - l R k g Rk , for k=l, . . ., K and k^l. 

Proposition 1 can be used to construct columns of —B k ~ 1 R k and this matrix 
can be multiplied by gR k . If x E = s uv for some (u, v), then several other simpli- 
fications arise. 

STEP 5: Determine g^yi-Br'RigR! 

Once the updated column has been determined, the usual primal simplex ratio test is used to 
select the leaving variable. Given the leaving variable and the updated column, the flows are easily 
updated in the usual manner. 


2.5 Updating Bases, B\, . . ., B K , Working Basis Inverse, and Dual Variables 

Hartman and Lasdon [16] have developed an excellent procedure for updating both Q _1 and 
X</s after each pivot. All updating required for the bases, B u . . ., B K , can be efficiently per- 
formed using the Augmented Predecessor Index Method [10]. We now address the question of updat-i 
ing the vi k s. Recall that for all basic columns, Xij kt ir ik —tr M +),k-\-^ii —c t3k =0, for all basic columns, 
fa, 7T«=0, while for basic columns, aj k , — Tr if+3 , k =C. Hence, t«'s and ir M+} , k 's for all roots are; 
determined. Then for all arcs incident to a root node, the rule ir ik — ir u+jik — X <4 =c < ^ can be used 




to determine the ir not corresponding to the root node. All of the w's are uniquely determined 
iteratively because of a rooted tree being connected and having no cycles. 

2.6 Reinversion of Working Basic Inverse and Updating Dual Variables 

Since Q~ l and the X's are updated at each pivot rather than recalculated, the accumulation 
of roundoff error eventually forces Q" 1 and X to be recalculated from the original problem data. 
Proposition 2 can be used to develop an accurate Q and then an inversion routine can be used to 
calculate Q~K 

To find the dual variables associated with the basis B*, we need to solve the system 

W, \']B*=c m 

where c' s * denotes the cost associated with the basic variables. Partition c' m as [c' Bv . , ., c' Bk , c' Rv . . ., 
c'r k , 0'] where c' Rk has the same dimension as R k . By multiplying on the right by L we have the 
system: [v'\]B*L=c' m L or 

I 1 

1 K 

y R 




B K 




s l 

s K 



Then Xo=0' and 

C Bl 

?B K 

c' R -c' Bl B 1 - 1 R 1 

C R K Cb k Bk 11 k 


}±r — (Cri Cflj-Di R\, • ■ •, Cr k — c Br B k Rk)Q ■ 

Proposition 1 is used to construct columns of B k ~ l R k for each k—1, . . ., K. Next the components 
cj— c'a^r^i are determined and finally the matrix multiplication by Q~ y yields X a . 


The primal simplex specialization as described in Section 2 has been coded and was used to 
obtain the experience reported in Tables 1, 2, 3, and 4. The code is wirtten entirely in FORTRAN 
ind has been tested on a CDC Cyber 72 and a CDC 6600 using the FTN compiler. The object 
program and data are held incore with all data in floating point mode. The rooted spanning forests 
ire stored using Johnson's [21] triple labels. The Augmented Predecessor Index Method (see [10]) 


is used for updating the forests associated with each commodity. The working basis inverse (Q -1 ) 
is updated using the procedure of Hartman and Lasdon [16]. The code consists of approximately 
sixteen hundred FORTRAN statements organized into a main program and fifteen subroutines. 
All data files appear in common blocks so that there is a minimum of time loss due to the subroutine 
structure. The storage requirement is MNK+3MN+7(M+N)K+D+8L+80Q0 where L is the 
maximum size of the working basis. 

Experimental tests indicated that there was no significant difference in the code's running speed 
when the dual variables were recalculated at the end of a pivot rather than updating those that 
change. The experience reported in Tables 1, 2, and 3 is for a code that recalculated the dual variables 
at the end of each pivot. The variables r ik , a jk , $ tj provided the starting bases for all problems. 

Table 1 presents a summary of computational time as a function of the pricing rule used. 
The first positive evaluator rule implies that the first nonbasic variable encountered with revised 
cost greater than zero is selected as the entering variable. Pricing all nonbasic variables and enter- 
ing the one with largest revised cost is the largest evaluator rule. The largest evaluator in a block rule 
involves holding the commodity subscript, k, constant and pricing all nonbasic variables associated 
with that commodity. If all revised costs are negative, k is incremented and the procedure repeated. 
If a nonbasic variable prices out greater than zero for some k, then the nonbasic variable associated 
with commodit}^ k with largest revised cost is selected as the entering variable. Table 2 presents 
the percentage of total time spent in the pricing operation for each rule. 

The coefficients for the test problems were generated using uniform distributions with pre- 
determined upper and lower limits. The arc costs were uniformly distributed over the interval 
[0, 100] while the supplies and demands were distributed over [100, 300]. The arc capacities were 
distributed over the interval [200, 600]. 

For our in-core code the largest evaluator rule proved superior. However, for large problems 
which can not be handled in-core, the largest evaluator in a block rule appears quite attractive. 
Additional experimental work with different partial pricing strategies is needed in order to obtain 
the best performance from the primal partitioning procedure. 

Table 3 presents our computational experience in comparing the APEX III System with the 
primal partitioning code. APEX III is Control Data Corporation's production linear programming 
system. APEX III has an in-core optimizer which uses the product form of the inverse. Only the 
distinct elements of the original constraints and the eta vectors are stored explicitly. Plus and 
minus ones are handled separately to avoid trival multiplications and divisions by these values. 
The APEX system was run twice with each of the test problems. In the first run, identity matrices 
were used for starting bases while for the second run starting bases were crashed. However, these 
different starting bases produced optimal solutions in approximately the same time. The primal 
partitioning code used identity matrices for the starting bases for all problems. 

It is anticipated that a production code would store only the nonzero elements of the working 
basis inverse. It is also conceivable that one could follow the suggestion of Kalan [23] and only 
store the distinct elements. Table 4 presents, the characteristics of the working basis inverse at 
optimality for several problems of various sizes. Note that a large proportion of the nonzero elements 
are plus and minus ones and the number of distinct elements is quite small. It appears that these 
characteristics are due to the fact that the elements of Q are from { — 1,0,1 }. 



















time per 



lO I- © •* Ol h- l 

O O rH rH rH CN 1 

<zi d o <z> d <^ 


H N O CD O! ^ i 
Wh uj OOl N 1 

eo to h oo t- oi i 

rH rH CN "* 1 


CO N 0> N N O 

O Oh •* 00 N 

CN 1 


m uj m ic w h i 







time per 


to o m n* m i« 

OHHrt N M^l 

o o o o o o o 


UJ U) 00 M ffl 00 CI 

oo co 05 m co oo rt< 
cn ■<* t> o m cn ■* 

H rt IN M 


M 00 N O O 'f !D 

o o -h co co eo m 

rH CN 


m m m m tj* cn i— i 











time per 





00 ■* i l l l i 

co eo 

<# cs i i i i i 


Tj4 rH 


m cn 






o ■* cn o -<t< o ^ 

CI © b- © Cl CO © 

■*N9 CO(0 h f 

rH rH CN CN 

t- CN CO O CO CN t- 

•* Oi ■* o CO CO O 
rH rH CN CO CO Tf in 


CO || 

t»00 050 H N M 

rH rH rH rH 


me for 




i> m m ■* rt< eo 




f — N 




"3 8-S 

M (O N Ol O O 

-£ S s 
o .s c 

rH CO Ol O CO CN 1 


H +» 'd 









a 05 o S 

in in »n in eo rH i 


05 H t- 

Ol H 05 00 N M M 



0m^3 ft 








03 05 -ri 

in 00 T« i-H >H 00 CO 

o .5 a 


rH eo oo m •* co m 







3 05 O S 

m m m m •* cn rH 




g §.s 

05 C ^ 



Pm5 ft 






03 05 ^ 

CO CN l 


H -rs.g 


rH CO 1 1 1 1 

CN CN l l l l l 







a 95 o a 

m CN I 


O •* CN O ■* O ■* 



oi o h o ei e o 



■* N Ol M CO H N 

rS <St 


rH rH CN CN 






N « M OM N S 


■* Ol hh O CO CO O 


rH rH CN CO CO •>* W 


a> II 

N 00 Ol O H N CO 

co h 





'-£ 1 — 1 















, ■* 




^ 9 





§ ft 










(-1 . -~v 


D >^ J» 



oa a 







p ~ 




F-\ +» a d 

- og-|S 















ST 03 !h 9 







f™ 1 


i -1 




1— 1 




52 -P.* a 






i— 1 

I— 1 


i— 1 


o" w 

CO <M 







.2 + 













OS <— 








00 <- 



CO || 























Table 4. Characteristics of Q~ l 


tion No. 




Size of 



minus ones 

plus ones 














































































In this study we have presented computational approaches for developing an efficient imple- 
mentation of the primal partitioning simplex method for solving multicommodity transportation 
problems. These approaches can be easily extended to more general multicommodity network 
flow models and we only used the transportation structure due to the simplicity of this model. 
This study has indicated the viability of solving large multicommodity network flow problems via 
the primal partitioning technique. Our computational experience indicates that large problems in 
this class should be solvable in-core. That is, only modest storage requirements are needed for the 
working basis inverse (Q _1 ) if it is packed and only the distinct elements are stored. Finally this 
study points out the ease in which the working basis can be regenerated when reinversion is required. 


The author wishes to express his appreciation to Gilbert Robertson of Plastics Manufacturing 
Company and Sarah Shipley of Control Data Corporation for helping with the computer runs 
involving the APEX system. Appreciation is also extended to Control Data Corporation for pro- 
viding the author with free computer time and access to APEX. The author is also grateful to 
Mustafa Abdulaal for his helpful comments. 


This experimental code may be obtained free of charge by writing directly to the author. 


[1] Bazaraa, M. S., "An Infeasibility Pricing Algorithm for the Multicommodity Minimal Cost 
Flow Problem," Working Paper, Industrial and Systems Engineering Department, Georgia 
Institute of Technology (1973). 

[2] Bellmore, M., G. Bennington and S. Lubore, "A Multivehicle Tanker Scheduling Problem," 
Transportation Science, 5, 36-47 (1971). 


[3] Bennett, J. M., "An Approach to Some Structured Linear Programming Problems," Operations 

Research, 14, 636-645 (1966). 
[4] Chen, H. and C G. DeWald, "A Generalized Chain Labelling Algorithm for Solving Large 

Multicommodity Flow Problems," Computers and Operations Research, 1, 437-465 (1974). 
[5] Clark, S. and J. Surkis, "An Operations Research Approach to Racial Desegregation of School 

Systems," Socio-Economic Planning Science, 1, 295-272 (1968). 
[6] Cremeans, J. E., R. A. Smith and G. R. Tyndall, "Optimal Multicommodity Network Flows 

with Resource Allocation," Naval Research Logistics Quarterly, 17, 269-280 (1970). 
[7] Evans, J. R., J. J. Jarvis and R. A. Duke, "Matroids, Unimodularity, and the Multicommodity 

Transportation Problem," Presented at the Joint National Meeting of ORSA/TIMS, 

Chicago (1975). 
[8] Ford, L. R. and D. R. Fulkerson, "A Suggested Computation for Maximal Multicommodity 

Network Flows," Management Science, 5(1), 97-101 (1958). 
[9] Geoffrion, A. M. and G. W. Graves, "Multicommodity Distribution System Design By 

Benders Decomposition," Management Science, 20(b), 822-844 (1974). 
[10] Glover, F., D. Karney and D. Klingman, "The Augmented Predecessor Index Method for 

Locating Step ping-Stone Paths and Assigning Dual Prices in Distribution Problems," 

Transportation Science, 6, 171-179 (1972). 
[11] Glover, Fred, D. Karney, D. Klingman and A. Napier, "A Computational Study on Starts 

Procedures, Basis Change Criteria, and Solution Algorithms for Transportation Problems," 

Management Science, ISO (5), 793-813 (1974). 
[12] Gomory, R. E. and T. C. Hu, "Multi-Terminal Network Flows," Journal of the Society for 

Industrial and Applied Mathematics, 9(4), 551-570 (1961). 
[13] Graves, G. W. and R. D. McBride, "A Dynamic Factorization Algorithm for General Large 

Scale Linear Programming Problems," Presented at ORSA/TIMS National Meeting in 

San Juan, Puerto Rico (1974). 
[14] Grigoriadis, M v D. and W. W. White, "A Partitioning Algorithm for the Multicommodity 

Network Flow Problem," Mathematical Programming 3, 157-177 (1972). 
[15] Hartman, J. K. and L. S. Lasdon, "A Generalized Upper Bounding Method for Doubly Couple 

Linear Programs," Naval Research Logistics Quarterly, 17(4), 411-429 (1970). 
[16] Hartman, J. K. and L. S. Lasdon, "A Generalized Upper Bounding Algorithm for Multi- 
commodity Network Flow Problems," Networks, 1, 333-354 (1972). 
[17] Held, M., P. Wolfe and H. Crowder, "Validation of Subgradient Optimization," Mathematical 

Programming, 6, 62-88 (1974). 
[18] Jarvis, J. J., and P. D. Keith, "Multi-Commodity Flows with Upper and Lower Bounds," 

Presented to the Joint National Meeting of ORSA and TIMS in Boston (Spring 1974). 
[19] Jewell, W. S., "A Primal-Dual Multi-Commodity Flow Algorithm," ORC Report 66-24, 

University of California, Berkeley (1966). 
[20] Johnson, Ellis L., "Programming in Networks and Graphs," Operations Research Center 

Report No. 65-1, University of California, Berkely, (1965). 
[21] Johnson, Ellis L., "Networks and Basic Solutions," Operations Research, 14(4), 619-623 

[22] Jorgensen, N. O., "Some Aspects of the Urban Traffic Assignment Problem," Graduate, 

Report, The Institute of Transportation and Traffic Engineering, University of California, 

Berkeley (1963). 


[23] Kalan, James E., "Aspects of Large-Scale In-Core Linear Programming," Proceedings of the 

1971 Annual Conference of the ACM, Chicago, 111., 304-313 
[24] Kaul, R. N., "An Extension of Generalized Upper Bounded Techniques for Linear Program- 
ming," ORC 65-27, Operations Research Center, University of California, Berkeley (1965), 
[25] Langley, R. W., J. L. Kennington and C. M. Shetty, "Efficient Computational Devices for 

the Capacitated Transportation Problem, " Naval Research Logistics Quarterly, 21(4:), 

637-647, (1974). 
[26] Lasdon, Leon S., Optimization Theory for Large Systems (Macmillan Company, New York. 

[27] LeBlanc, L. J., "Mathematical Programming Algorithms for Large Scale Network Equilibrium 

and Network Design Problems," unpublished dissertation, Dept. of Industrial Engineering 

and Management Sciences, Northwestern University (1973). 
[28] Saigal, Romesh, '' Multicommodity Flows in Directed Networks," ORC Report 66-24, 

University of California, Berkeley (1966). 
[29] Sakarovitch, Michel, "The Multi-Commodity Maximal Flow Problem," ORC Report 66-25, 

University of California, Berkeley (1966). 
[30] Srinivansan, V. and G. L. Thompson, "Benefit-Cost Analysis of Coding Techniques for the 

Primal Transportation Algorithm," Journal, Association for Computing Machinery 20, 

194-213 (1973). 
[31] Swoveland, Cary, "Decomposition Algorithms for the Multi-Commodity Distribution 

Problem," Working Paper No. 184, Western Management Science Institute, UCLA (1971). 
[32] Tomlin, J. A., "Minimum-Cost Multicommodity Network Flows," Operations Research, 

14(1), 45-51 (1966). 
[33] White, W. W. and E. Wrath all, "A System For Railroad Traffic Scheduling," Tech. Report 

No. 320-2993, IBM— Philadelphia Scientific Center (1970). 
[34] White, W. W., "Mathematical Pregrajmain Multicommodity Flows, and Communication 

Nets," Proceedings of the Symposium on Computer — Communications Networks and 

Teletraffic at Polytechnic Institute of Brooklyn, 325-334 (1972). 
35] Wollmer, R. D., "Multicommodity Networks with Resource Constraints: The Generalized 

Multicommodity Flow Problem," Networks, 1, 245-263, (1972). 




Leon Cooper and Larry J. LeBlanc 

Southern Methodist University 
Dallas, Texas 


A class of convex programming problems with network type constraints is 
addressed and an algorithm for obtaining the optimal solution is described. The 
stochastic transportation problem (minimize shipping costs plus expected holding 
and shortage costs at demand points subject to limitations on supply) is shown to 
be amenable to the solution technique presented. Network problems whose objec- 
tive function is non-separable and network problems with side constraints are also 
shown to be solvable by the algorithm. Several large stochastic transportation 
problems with up to 15,000 variables and non-negativity constraints and 50 supply 
constraints are solved. 


In previous papers (see [2, 3]), the Frank- Wolfe convex programming algorithm has been 
applied to the network equilibrium problem and to a nonlinear transportation-production problem. 
The results indicated that the algorithm is an efficient computational method compared with 
existing alternatives. In this paper, we point our a much wider class of problems for which this 
approach has proven to be surprisingly efficient. In particular we give several examples of relatively 
large-scale (5,000-15,000 variables) nonlinear programming problems with very modest com- 
putational times (30-50 seconds on a CDC Cyber 70, model 72 for the 5,000 variable problems). 

The class of problems we consider is of the form 


Ax=b (NLP) 


where xeR n , A is mXn, beR m and f(x) is a convex differentiable function. We consider problems 
where (NLP) would be readily solvable if j{x) were a linear function. Examples include convex 
minimum cost flow problems (single or multicommodity), stochastic transportation problems 
and network problems with linear or convex side constraints. In the latter case the side constraints 
can be included in the objective function by means of a penalty function, which results in a problem 
of the form (NLP). An additional class of problems to which this general approach is applicable 



are those whose constraints exhibit a network structure and have nonseparable (convex) objective 

The Frank-Wolfe algorithm is described in Zangwill [6], pp. 158-162. Briefly the method is as 
follows: The algorithm determines a search direction by solving the linear programming sub- 
problem of minimizing a first-order Taylor's approximation to /(•) about a feasible solution x k , 
subject to the constraints of (NLP) : 

Min J(x k )+Vj \x k ) ■ {z-x k ) 


Az=b (SP) 

The terms j{x k ) and v/(^)-^* are constant and may be omitted when solving the subproblem 
(SP) . If z k is the optimal solution to (SP) , then the search direction is defined to be : 

A'ter searching in this direction, a new point £* +1 is obtained and the process is repeated. The proof 
that the sequence x k converges to x*, the optimal solution to (NLP), is given in [6]. 

Although the computational effort of solving the linear program (SP) may seem to be un- 
necessarily high just to find a search direction, this has not proved to be the case. In the problems 
described in this paper, problem (SP) is solvable by inspection or by simple network techniques. 
Although the Frank- Wolfe algorithm is known to be only linearly convergent, computational 
results have indicated that for large scale problems of the type described above, the total compu- 
tational effort (i.e., the number of iterations multiplied by the computational effort per iteration) 
is considerably less than that of alternative solution techniques. Several examples are given in 
Section 5. 

At each iteration of the Frank-Wolfe procedure, a lower bound on the optimal value of (NLP) 
is available by noting that: 

18 1 

fix*) >/(a*) +V/(a*) • (x*-a*) >f(x k ) +V/(x*) ■ (z k ~x k ) 
See [3] for a derivation of this result. 


The stochastic transportation problem is concerned with how to choose quantities to be shipped 
from supply points to demand points when the requirements at destinations are random variables 
rather than known constants. Since customer demands are not known, if a certain quantity of 
material is shipped to some destination, then an expected holding cost and and an expected shortage 
cost is incurred. In the stochastic transportation problem, we wish to choose amounts to be shipped 
from each supply point to each demand point in order to minimize shipping costs (which are de- 
terministic) plus expected holding and shortage costs. The problem is shown to be a convex non- 
linear programming problem in [1]. In the stochastic transportation problem, we consider m existing 
supply points, each with a known supply of Si, i=l, 2, . . . , m. We are also given n demand points, 



each with demand D u j=l, 2, . . . , n, where D, is a random variable with density function <t> } (v). 
We then wish to choose shipments x tj from supply points to demand points to 

(1) Min X) S c w x„+S *» (y>— »)«i(»)dt;+^ (v-yj)<t>j(v)dv 

i = l ;=1 j=l L JO Jv. J 


(2) V^=S*« ;'=1,2, ...,« 


(3) Ss«<S« £=1,2, ...,m (STP) 

(4) x o >0 t=l, 2, . . ., m;j=l,2, . . .,n 


c i ^ r =unit shipping cost from supply point i to destination,;. 

A>=unit holding cost at destination j. 
(ftj (y) =probability density function for demand at destination j. 

p>=unit shortage cost at destination j. 

St =supply at i. 
In (1), y } is the total amount shipped into destination j from all sources, and so 

I J (yj—v)<t>j(v)dv 

is the expected amount of material which must be held at destination j. Similarly, 

I (v—yj)4>}(v)dv 


is the expected shortage. Therefore the expected holding cost at destination j is 

h i I J (yj~v)4»i(p)dv 
and the expected shortage cost is 

Vi (v—y t )<l>j(v)dv 

Thus we are minimizing shipping costs plus expected holding and shortage costs. Costraints (2) 
are definitional constraints relating y h the total amount shipped into demand point j, to the ship- 
ments Xij. Constraints (3) are supply constraints for each supply point i. 

We now discuss the computational aspects of the Frank-Wolfe algorithm for (STP) and indi- 
cate why it is more efficient than any other technique that we are aware of for this problem. In 
the Frank-Wolfe algorithm a linear programming subproblem must be solved at each iteration. 
The objective function of (SP) changes at each iteration; at iteration k the objective functoin is 
V/(x*), where x* is the current vector of flows. It is shown in [1] that the nonlinear objective func- 
tion (1) can be written 

(5) f(x)=^ic i p: tf +^[h/y i +(h 1 +p ) ) \ {v-y ] )<t >j {v)dv'\ 

a i=i L Jvi J 


where x has components x i} . Nov/ define 

(6) 9i(yi)=h,Vi+<hi+Ps) (v—yi)<l>j(v)dv 
An examination of (5) shows that 


The first term within the brackets is zero, and we have therefore: 

(7) MgLsea+hr- (hj+pj) £ Uv)dv 
Now if x* is a feasible solution for (STP) , we define : 

bx tj 

x=x k 

Then when using the Frank-Wolfe algorithm on the stochastic transportation problem, the sub- 
problem (SP) becomes 

m n 

Min SE CijZ t ] 

!=1 j = l 

(8) Zij>0 i=l, 2, . . ., to; j=l, 2, . . .,n 


S Zn<Si *=l-i 2, . . ., to 

7 = 1 

The constraints in the linear program (SP) never change; they are supply constraints and non- 
negativity at each iteration. Constraints (2) are not included because they are only definitional; 
they would not be present in the (STP) and therefore would not be in the subproblems (SP). 


Instead, the variables y } would be replaced bv S x u m the objective function (1) and deleted from 

i = l 

the problem. 

The key to the computational success of the Frank- Wolfe algorithm for solving the stochastic 
transportation problem is that there are no constraints requiring any material to be shipped to the 
destinations. Thus, since the objective function is linear in the subproblems, each subproblem 
decomposes into to separate problems, one for each supply point. Because of this the optimal solu- 
tion to each subproblem is obvious by inspection. The optimal solution to subproblem (SP) is 
obtained at each iteration by examining each source i and choosing 

c ik =mmc ij 




The optimal solution to each (SP) is then given as follows: 
(9) lfc tt <0, [ ' 

(10) If c«>0, z%=0 j=l, 2, . . ., n 

In other words, the optimal solution is to ship everything available to demand point k if c JA; <0, 
and to ship nothing if Cijfc>0. This is possible because there are no constraints requiring material 
to be shipped to any destination. The simple form shown in (9) and (10) is the reason that the 
Frank-Wolfe algorithm is so efficient for large stochastic transportation problems. Numerical 
results are shown in Section 5. 

Since the stochastic transportation problem is basically a network problem, an obvious approach 
is to use piecewise linear approximation and a minimal cost flow algorithm. To compare the Frank- 
Wolfe technique with piecewise linear approximation, a ten-source, 100 destination problem was 
solved by both algorithms. Shipping costs, supplies, demands, etc., for this problem were generated 
as random numbers. The data are described in Section 5. 

e The linear cost network used to model this problem is shown in Figure 1. Nodes 1-10 are sup- 
ply points and nodes 11-110 are demand points. The, 100 sets of 10 nodes each, namely nodes 111- 
120, . . ., 1,101-1,110 were used so that a 10-piece linear approximation to the expected holding 
and shortage costs function at the corresponding destination could be used. Ten linear pieces were 
necessary to achieve 2% accuracy (the same accuracy was demanded of the Frank- Wolfe algorithm). 

Computing time for the Out of Kilter algorithm was 48.5 seconds on the CDC Cyber 70, 
Model 72; the Frank- Wolfe technique took only 10.7 seconds. In addition, the Out of Kilter tech- 
nique required more than 69,000 words of memory and the Frank- Wolfe technique required less 
than 24,000 words. Because of this memory difference, the Out of Kilter algorithm was actually 
6.5 times more expensive to run. 

More importantly, the largest stochastic transportation problem that the Out of Kilter algo- 
rithm could handle was 10X100 (total memory available was only 70,000 words). The Frank- Wolfe 
technique, because of its smaller memory requirements, could handle problems 15 times as large. 

Hadley [1] has proposed that Dantzig- Wolfe decomposition be applied on the piecewise linear 
approximation to a stochastic transportation problem. However, it is known [5] that Dantzig- Wolfe 
decomposition usually requires much greater computational efforts than the ordinary simplex 
method for the same problem. 

Another solution technique attempted for (STP) was the penalty approach. A quadratic 
penalty function and the conjugate gradient algorithm [4] were coded for the same 10X100 
stochastic transportation problem. This approach was abandoned when it failed to converge even 
after several hundred seconds of CPU time. 


A variation of the classic multi-commodity transshipment problem, in which the arcs are 
uncapacitated but the objective function penalizes flow exceeding a specified threshold, is easily 
solved by the Frank- Wolfe algorithm. Letting x t ,' denote the flow of commodity s along arc ij, 



Figure 1 
Si* denote the supply of s at supply point i, anr 1 D/ denote the demand for s at j, the problem is to 



Min Sjfo(:£ *<f) 

2—t *(i Uj 

2-i x u^St 



all demand points j=l, 2, . 
all products s=l, 2, . . .,p 

all supply points i=l, 2, . . 
all products s=l, 2, . . ., p 

all i, j, s 



In (11) jij (•) is the shipping cost function for arc ij. In urban network models,/,; (•) is the travel 
time on arc ij, and x i; s represents the flow of automobile traffic on arc ij with destination s. The 
functional form used by the U.S. Bureau of Public Roads is 

/« (£} *</)= A ti (± x t f)+B ti (± XtfJ 

where A tJ and B fj are specified paramenters for arc ij. The paramenter B if is chosen small (typically 
10~ 5 ) so that/ w (•) is nearly linear for small flow values. For large flow rates congestion occurs and 
j tj (•) increases much faster than linearly. When using the Frank- Wolfe algorithm to solve the 
multi-commodity network problem (11)-(14), each subproblem is a multi-commoditjr transship- 


ment problem with no arc capacities. Because there are no arc capacities, each subproblem splits 
up into p separate transshipment problems, one for each commodity. 

In [2], the network equilibrium problem was solved using the Frank- Wolfe algorithm. The 
network equilibrium problem is a special type of multicommodity network flow problem in which 
the constraints specify that a certain amount of automobile traffic must flow between each pair 
of nodes. As in constraints (12)-(14), there are no arc capacities; instead, the nonlinear objective 
function prevents excessive flows on any arc at optimality. The Frank-Wolfe subproblems are 
even simpler for the network equilibrium problem. Since the constraints state only that a certain 
amount of traffic must travel between each pair of nodes, the Frank- Wolfe subproblems are simply 
shortest route problems. In [2] a nonlinear program with 1824 variables and non-negativity con- 
straints and 552 conservation of flow constraints was solved in 9 seconds on the CDC 6400 com- 
puter. Approximately 20 iterations of the Frank- Wolfe algorithm were required. The same problem 
was approximated by a piecewise linear function and solved by the simplex method (the multi- 
commodity aspect required that the simplex method be used). Computing time on the 6400 was 
11 minutes and 40 seconds — more than 77 times as great as the computation time of the Frank- 
Wolfe algorithm! The optimal values obtained from Frank- Wolfe algorithm and from the simplex 
method differed only by 1.2%. 


It is possible that a mathematical programming model may exhibit a network structure in its 
constraints but have a general (i.e., non-separable) convex objective function. The problem would 
then be considered a general nonlinear programming problem which would not be reaidly amenable 
to the conventional approach of piecewise linear approximation. The problem we consider is again 
(NLP) where the matrix A has a transportation or other simple structure characteristic of net- 
works. A simple example of such a problem is as follows. Consider the job shop network of Figure 2. 
Items continuously flow from the source (node 1) to the destination (node 19) „ Each item must 
be processed first on any one of the four lathes (nodes 2, 3, 4, or 5), and then on any one of the 
two drill presses, unless lathe 5 is used. From the figure, we see that the items must then be shaped, 
sanded, and welded. Finally, each item must be drilled again. It should be emphasized that the 
drill presses associated with nodes 17 and 18 are identical machines to the drill presses designated 
by nodes 6 and 7, respectively. We see from the figure that all jobs flowing along arcs (2, 6), (3, 6), 
and (4, 6) must be processed by the operator of the drill press designated by node 6. In addition, 
jobs flowing along arcs (12, 17), (13, 17), (14, 17), (15, 17), and (16, 17) must be processed by the 
drill press operator at node 17. However, since drill presses 6 and 17 are identical and have the 
same operator, the processing cost associated with these arcs is 

(15) y(x)=/(X26) #36) ^46, £]2. 17) Xl3, 17) ^14. 17) #15, 17) 2?16. n) 

= J (^26 ~h 2^36 ~T" 3^46 ~[ ^12. 17 + 213, 17 ~T%U, 17 "T^IS. 17 T~16, 17) 

We assume that/(-) is convex; a common example would be when overtime costs must be paid or 
an additional operator hired if the total flow into nodes 6 and 17 is too large. Letting i(x)=x 2 6+x 3 6-|- 






( 2 1 


\DRILL PRESS/^) / /^\ \\ 


1 3 2 — ' 

V*"\ 6 L / \SANDER / S* \\ 

J 17 )— T_ s~\ 

/\ / 1 Q 1 »■/ 11 1 «./ 14 1 \ 

rt "-C 19 J 

Figure 2 

^46+^12. i7+»i3. 17+Zu. 17+^15.17+^16, 17, we have f(x) =j{t (x)) . A typical functional form would be that 
shown in Figure 3: 


f(t(x))=ct+dt 2 

The parameter d is chosen sufficiently small so that /(•) is nearly linear except for flow values 
exceeding some threshold. 

Obviously the function in (16) contains many cross products, and so separable programming 
cannot be used directly. Although separability can easily be induced in (16), to do so would destroy 
the network structure of the constraints, leaving a general linear programming problem as an 
approximation to the scheduling problem. 

On the other hand, when the Frank- Wolfe algorithm is used on a network problem with 
non-separable objective function, each subproblem is obviously a network problem. In fact, if 
the scheduling problem for the network in Figure 2 is a minimum cost flow problem (with increasing 
marginal costs instead of capacities), then each subproblem is a shortest path problem. 

Many other examples of non-separable problems occur in scheduling of manufacturing proces- 
ses. Examples include situations where distinct work stations can be monitored by a single individual 

Figure 3 


at low flow volumes. However, when the total flow volume into these distinct stations becomes 
large, additional monitors must be procured. 


Frequently network problems which arise in practice are complicated by the presence of a 
few side constraints. For example, we may have an assignment problem with additional linear or 
convex constraints which destroy its special structure. Such a problem would be of the form 

(17) Min c T x 


(18) Ax=b 

(19) g t (x)<0 1 = 1, 2,..., m 

where the problem is easily solved in the absence of constraints (19). We can cope with the problem 
by forming the barrier function [6]: 

TO ~ 

(20) B k {x)=c T x-J± 

f^i g t (x) 
We then must solve 

771 y» 

Min c T x-J} -f£_ 
x >o i=i gAx) 


This is a convex programming problem amenable to the solution technique described previously, 
although if there are too many side constraints, the Frank- Wolfe technique would probably have 
difficulty because of the poor eigenvalue structure of the barrier function. If the objective function 
(17) were convex instead of linear, the the barrier function (20) would still be convex, and the 
Frank- Wolfe technique would still be applicable. 


To test the efficiency of the Frank- Wolfe algorithm, several large scale stochastic transporta- 
tion problems were solved. For debugging purposes, a small 3 source, 3 destination problem was 
used; next fourteen 25 by 200 problems were solved. In all of the stochastic transportaion problems 
the coordinates of each supply and demand point were chosen as uniform random nurnbers between 
and 100. Demand was assumed to be exponentially distributed at each demand point; the para- 
meters A, were chosen as uniform random numbers in the interval [.005, .025]. Since the expected 
demand at destination j equals 1/Xj, demands were in the range 

Ear :ok]= 140 < 2001 - 

Supplies were chosen randomly in the interval [125,175]; shipping costs were chosen proportional 
to the distances between supply and demand points. Holding costs were in the range [3, 6], while 



shortage costs varied between 20 and 60. This was done so that shortage costs would be significantly 
greater than holding costs and shipping costs. 

In a previous paper [3] which used the Frank- Wolfe algorithm, the authors also studied prob- 
lems in which the objective function included linear functions and non-linear functions. For such 
problems, it was noted that the number of iterations of the Frank- Wolfe algorithm required for 
any given degree of accuracy depended upon the ratio of the non-linear costs to the linear costs. For 
that reason, in this paper we have chosen costs such that at optimality the nonlinear expected 
holding and shortage costs accounted for approximately 95-97% of the total costs. This was accom- 
plished by choosing appropriate proportionality constants in calculating the shipping costs. 

Computational results for the above problems are as follows. For the 3x3 problem the number 
of iterations for a solution accurate to within 5% of the lower bound was 5; 9 iterations were re- 
quired for 2% accuracy. For the fourteen 25 x 200 problems, average computing times and numbers 
of iterations are shown in Table 1. Remarkably, we see that the number of iterations required for 
2% accuracy for the 9 variable problem and the 5000 variable problems differed by only a factor 
of two. 

Table 1. Average number of iterations and CPU time (CYBER 70, Model 72) 

Average number of 
Iterations and Std. Dev. 

Average CPU 
Time (Seconds) 

5% accuracy 

10. 2 ±1. 1 


3% accuracy 

13. 6±1. 6 



17. 4±1. 6 


Finally, ten 50X300 stochastic transportation problems were solved. These 15,000 variable 
problems proved more difficult to solve as accurately as the smaller problems (perhaps because 
of round off errors). Because of the higher computing times, these latter problems were solved only 
to 3.5% accuracy. In practical problems as large as these, we feel that 3.5% accuracy is probably 
more accurate than the values of the parameters used and the assumptions of linear shipping cost 
and unit holding and shortage costs. Average number of iterations and computer time were 78.9 
and 9 minutes, 55 seconds, respectively. 

It appears from the above numerical results that large-scale stochastic transportation problems 
can be solved quite efficiently using the technique described in this paper. These results indicate 
that the number of iterations increases very slowly with problem size. Also, the computational 
effort for each iteration consists of scanning each column of an m X n matrix exactly once and a one 
dimensional search of mn variables. Therefore the computational effort for each iteration increases 
only linearly with problem size. 


We have addressed a class of convex network problems and have shown that, by capitalizing 
on their structure, the Frank- Wolfe algorithm becomes extremely efficient for large-scale problems. 
Several different examples of convex network problems have been considered. In each case, we have 


shown that the structure of the problem can be exploited to yield an efficient solution algorithm 
even for realistically large problems. 


[1] Hadley, G., Nonlinear and Dynamic Programming (Addison-Wesley, Reading, 1964). 

[2] LeBlanc, L. J., E. K. Morlok and W. P. Pierskalla, "An Efficient Approach to Solving the Road 

Network Equilibrium Traffic Assignment Problem," Transportation Research, 9, 309-318 

[3] LeBlanc, L. J. and L. Cooper, "The Transportation-Production Problem," Transportation 

Science, 8 (4) 344-354 (1974). 
[4] Luenberger, D., Introduction to Linear and Nonlinear Programming (Addison-Wesley, Reading, 

[5] Wacker, W. D., "A Study of the Decomposition Algorithm for Linear Programming," M.S. 

Thesis, Washington University (1967). 
[6] Zangwill, W., Nonlinear Programming: A Unified Approach (Prentice-Hall, Englewood Cliffs, 



I. Gertsbach 

Ben Gurion University of the Negev, 
Beersheva, Israel 


Heuristic algorithms for positioning a maximal number of independent units 
and constructing a schedule with minimal fleet size are proposed. These algorithms 
consist of two stages: defining the "leading" job and finding an optimal position 
for it. Decisions on both stages use some special criteria which have a probabilistic 
inteipretation. Some experimental data are given. 


The solution of many scheduling problems can be reduced to positioning, under certain con- 
ditions, units and zeroes in matrices. For example, in production schedules, the k-th element of 
j-th row, a jk , may represent the k-th time interval for the j-th machine ; a jk = 1 (0) means that in this 
time interval the machine is working (vacant). Constructing a schedule for processing a group of 
parts on a group of machines means to find "good" positions in such a way that the prescribed tech- 
nological sequence is guaranteed for every part, and every machine is functioning under proper con- 
ditions. Another example is that of transportation schedules, where the j-th row corresponds to a dis- 
crete time scale for the j-th station, units represent busy time moments, and zeroes represent non- 
occupied moments. A trip is defined by a sequence of stations and intervals between departures and 
arrivals. Selecting a trip in a schedule means, in fact, a displacement of "chains" consisting of units 
in station time scales. Such displacement, of course, must be done while satisfying given safety 

Many deterministic procedures have been proposed for solving problems similar to those 
mentioned above. They are based on graph theory [1], discrete linear programming [6] or heuristic 
approaches. For solving certain specific problems, many effective algorithms have been proposed, 
e.g., for Johnson's problem [5]. 

If we try to formalize a real scheduling situation, we have to take into consideration many 
restrictions and the essential increase of the size of the problem that will make our "exact" solution 
more realistic and applicable but at the same time much more difficult to find. 



We can imagine the process of schedule construction as a sequential procedure. At each step, 
we solve a subproblem for disposing a specific group of units (variables) in free positions. This must 
be done, of course, by preserving all prescribed requirements. When the size of the matrix is large 
(in a realistic schedule it might be 200X200) the positioning of units fixed earlier might be assumed 
as random. This suggests the idea of approaching scheduling from a probabilistic point of view. 

This paper contains several heuristic algorithms which use probabilistic ideas. Some formal 
statements are also proved which serve as a guide for constructing heuristic algorithms. These 
algorithms might be useful in constructing large schedules where the traditional methods are not 

Section 2 deals with the problem of positioning independent and dependent units in a matrix 
with a given set of free and forbidden positions in every row. As a guide for proposed algorithms, 
two statements will serve. The first asserts that a specially chosen measure (entropy) is closely 
connected with a possibility of disposing the maximal number of independent units. The second 
asserts that there is a "universal" row which has all free positions available. By combining these 
statements, an algorithm is proposed and some numerical results are given. Section 3 deals with a 
job-shop scheduling problem in which each job occupies a machine for a fixed time length but its 
starting time can be shifted within certain tolerances. The problem is to find actual starting times 
for each job providing a minimal number of busy machines (minimal fleet size). A heuristic algorithm 
similar to the previous one is given and some experimental data are presented. Section 4 contains 
some concluding remarks about probabilistic interpretation of the sequential procedure of con- 
structing large-size schedules. 


2.1 Basic Definitions 

A square matrix ^4=110^11^^/ is given. I, denotes the set of all free positions in the j-th row. 
Ij is nonempty, j=l, . . ., M. Other positions in j-th row are forbidden. The set of units disposed 
in the free positions is called independent (ISU) if in every row and column there is not more than 
one unit. The deterministic algorithm for finding a maximal ISU is well known [1]. We give an 
alternative heuristic algorithm which exploits some probabilistic ideas. 

Let A=|k< \\mxm he a matrix with elements 

[0, if a jk £ /,; 

(2.1) a jk =- 

|J,|-S if <.*£/, 

(|/| denotes the number of elements in the set /). 
The values 


(2.2) b k =M- 1 J^a jk ,k=l,M 



form a discrete M-point distribution. Let us introduce its entropy 



E A =-^b k logb k 


and notftce that its maximal value E*='og M corresponds to a uniform distribution b k =M l . 

2.2 Kordonsky's Statement 

The following statement was' asserted by II. Kordonsky (1972). 

STATEMENT 1: If E A =E A * then there is an ISU which contains exactly M elements. 

PROOF: The condition E=E* means that A is bistochastic. Applying the Birkhoff theorem 

(see [71 Part 2, Section 1.4-1.7) one obtains A=^ C t AT, C,>0, X) C t =l, where each ^4* is a 0/1 

i=l i=l 

matrix with an ISU containing M elements. This completes the proof. 

Now let A=\\a ik \\ MxN ,M<.N, be a rectangular matrix. We construct in a similar way the ma- 
trix A=\\a jk \\ MxN and define the distribution {b k , k=l, N}. The entropy of it is maximal when all 


b k = 


STATEMENT 2: If bt—N' 1 then there is an ISU for A=\\a jk \\ MxN , M<N, which contains 
M elements. 

PROOF: Assume that in A it is possible to position only R<M units. Without loss of gen- 
erality let a#=l, i=l, R (see Figure 1). According to the hypothesis, the submatrix E has no 
free positions. b k =l/N means that in every column of D there must be free positions. Again let 













1 I 






Figure 1. Decomposition of the matrix A 


us assume that these positions all are in the submatrix consisting of k x lowest rows of D. Then 
the submatrix E x (see Figure 1) has no free positions. If it were not true, one could place a unit 
in that free position, shift a corresponding unit from R to D and in such a way increase the num- 
ber R. Now consider the submatrix ^ = 11^*11, j,k—l, (R—k x ) and the corresponding matrix 

R-h „ 

Ai= \\a }k \\, j,k=l, (R—ki). According to (2.2) and the equality b k =l/N, M~ l X) a jk ^b k =N~ 1 , 



This means that A\ cannot be stochastic, and according to the definition of R, R u and D 
there must be free elements in the first (R—ki) rows placed in D x (see Figure 1). Suppose again 
that they occupy the last k 2 rows. Repeating the above reasoning we shall conclude that E does not 
contain free elements at all. This contradicts the assumption that I jt j=l, . . ., M, are nonempty. 

2.3 Universal rows 

Now let us assume that for a given ^4=11^11^^/ the ISU consists of M elements. The ele- 
ment a jk is called admissible if after cancelling the j-th row and k-th column, the ISU for remaining 
part of A consists of (M — 1) elements. If for all k e Ij, a jk is admissible, the j-th row is called a 
universal row (U-row). 

STATEMENT 3 : If ISU has M elements then A has a U-row. 

PROOF: Let us consider one ISU in A. It is clear that it is worthwhile to investigate only 
the case when \I S \>:2. ,7 = 1, . . ., M. We shall call a cycle a closed polygon consisting of alternate 
horizontal and vertical segments. It is easy to verify that if \Ij\^-2 it is always possible to find 
a cycle in which free positions and units alternate. Now, in this cycle, the vertex adjoining some 
unit is also an admissible position for it. Let us cancel all nonoccupied elements in a constructed 
cycle and repeat the procedure. Sooner or later we shall get a matrix with one or more rows con- 
taining only one noncancelled position. Every such row is, according to the definition, a [/-row. 

REMARK: If j-th row is a [/-row, \I^2 then from the proof it follows that in the k-th col- 
umn, k d I], there must be at least one free position a Tk , r^j. 

The existence of a U-row is very important for an algorithm having as its purpose to position 
a maximal number of independent units. If we could "identify" the U-row we could safely start 
with positioning a unit in it. A heuristic procedure for identifying the U-row is proposed in Sec- 
tion 2.5. 

2.4 Random Matrices 

The object of consideration in Sections 2.4-2.5 will be square matrices which have a random 





arrangement of free and forbidden elements. Let n } be integers, 0<fij<M-l, j=l, M and A is a 
matrix with fixed ISU — set of size M. Consider a class of all matrices J?a„(mi, • • •, Hm) which con- 
tains all matrices obtained from A by means of all possible random choice of n t free positions 
among (M-l) in the j-th row. We assume that all permutations have the same probability. Thus 
we can speak about a probability of an event "j-th row is a U-r6W." This probability is denoted 
by Pj(n\, • • ■, Mjif)- Letj 2 >ii, jty,>M> Then we have the following: 





(2.4) Pjiifr, • • •> Wu • • •> M»> • • •> Pm)<Pj,(m, • ■ ■, fiji, ■ • ■, Mi 2 , • • ■> V-m) 

PKOOF: The proof follows from the relationships: 

lh lh lh U» 

Pjiim • • • tf/i • • • V-h ■ • ■ Pm)=Pci(hi . . . hj 2 . . ■ fiji . . . ix M ) 



M; 2 




lh lh 

(2.5) <Ph(t*i . . . JUf, . . . m/j • • • ^M)<Ph(fH • • . m/i • • • M;'« ■ • • Mm) 

The first equality follows from symmetry, the next inequality is true because an addition of 
free elements to the, j 2 -th row cannot increase the probability that this row would remain a U-vow. 
The last inequality is valid because by adding (j 2 -ji) free elements to j\-th row, the j 2 -th row remains 
a [7-row. 

Statement 4 gives a background for an intuitively clear fact that in a "random" matrix the 
shortest rows have a larger probability of being C7-rows. 

2.5 Identification of the [/-row and Algorithm 1 

Assume that one unit is placed in every row of A in such a manner that a certain position in a 

j-th row is chosen with probability \Ij\~ 1 , j=l, M. Let us consider the j -th row. The element a jo k 
| is called marked if there is at least one unit in k-th column, with jVj . The j -th row is called oc- 
cupied if all its free elements are marked. Now a long sequence of random placements of units is 
performed. For each member of that sequence the event "j -th row is occupied" or "j -th row is 
nonoccupied" is observed. After completing this sequence, we estimate the frequency P* for the 
event "the j-th row is occupied." The heuristic principle for identification of U-row is the following. 
The row j„ is declared to be a C7-row if 

(2.6) P%= max P* t 


The reasons for this principle might be Statement 4 and the following remark. Among all 
rows, the j -th row determined by (2.6) is the most "vulnerable;" it has more chances than other 
frows to be lost (occupied) by placing units in other rows. 

Now we formulate a rule for finding a suitable position in a given row (assumed to be a C7-row) . 

The identification of a C7-row might not always be correct. Therefore it would be desirable to 
supplement it by an optimal placement of the unit in that row. Statements 1 and 2 assert that the 
maximal number in an ISU is guaranteed when the entropy E A is maximal. Then, it seems to be ex- 
pedient to maximize the entropy by placing the unit in a given row. The latter might be performed 
n the following manner. Assume that a unit must be placed in the j-th row. The element a jk , kelj 
s chosen and put a;*.= l. For other rows, the values a ; * are defined according to (2.1), and then the 
/alue of the entropy E A (j, k) is calculated. The decision a,jt„=l is made, if 

'2.7) E A U,k )=m*xE A (3,k) 



Combining the heuristic principle for identification of a £/-row and for placing a unit in it, 
we obtain the following: 

a. A random placement of units is repeated K times and the U-tow j is found (see (2.6)). 

b. Injo-th row a unit is placed according to (2.7) on a certain place a^*- 
c The jo-th row and k*-th column are cancelled and the process continues for the rest of the 

matrix A 

2.6 Some Experimental Data 

By help of a computer, units were positioned in 50 matrices of size 40X40. 

Every row contained from 1 to 5 free elements and the matrices were constructed in such a 
manner that the ISU contained 40 elements. Algorithm 1 positioned 1997 units and, therefore, 
failed in positioning three units. 

2.7 Nonindependent Units 

Algorithm 1 has two important features : it is sequential and has two stages. That means that, 
first, a "leading" row is chosen and, second, a position for the unit in this row is sought. The algo- 
rithm requires only a "forward pass" and can be performed quickly even when it is applied to a 
large-size matrix. The similar algorithm might be constructed for a much more difficult problem, 
for example, for the problem of disposition of "chains" consisting of units. Such problems arise in 
aircraft scheduling, where a jk means that the k-th time period in the j-th airport is occupied ; the 
chain of units corresponds to a trip (string). In this case the units might be called "dependent" 
because between the units in one row belonging to different trips, there must be a safety delay time. 
The realistic criterion for a schedule would be the maximal number of trips "packed" into a given 
matrix. References [3, 4] contain some facts about the realization of a sequential two-stage algo 
rithm for a large-size aircraft schedule. 

2.8 Shortening the Sequential Procedure 

In computerized experiments it was stated that if on a certain stage of scheduling performed 
by Algorithm 1, the arrangement of rows was done according to the decrease of P* (see (2.6)), 
then without big changes the same arrangement remains valid during several steps of the pro 
cedure. Thus, by composing a schedule for 1500 trips [4], it is enough to calculate the priorities only 
5-6 times after finding positions for every 250-300 trips. 


3.1 The Statement of the Problem 

We have M jobs (segments) : the j-th one has a length of Z_, units of time and a starting time t t 
which has to be within given limits a h bj-.a^tj^bj. Every job must be performed without inter- i 
missions. Then there is a set of machines; each machine can perform every job and only one job 
can be performed in a given time on a given machine. 


The problem is to find such values t h j—1, M, so that the number of needed machines (fleet 
size) is minimal. We assume that a h b h t h lj are integers; rows in a matrix A represent jobs and 
columns represent time intervals. If the j-th job is performed with a starting time t u we put a ik =0 
for k<Ctj, k^>t}-\-l)—l and a }k —l otherwise. In that case the number of machines in the fleet is 


given by a value #=max 2 && (See [1], Chapter 2, Section 9, Special case of Dilworth's theorem) . 

k j=l 

3.2 Local Criterion 

In the case when the starting times are given only within their ranges [a } , b } ], we propose a new 
scalar quantity which is in some sense similar to the value H and gives a characterization of the 
mutual disposition of all jobs. 

Suppose that the j-th job can have its starting time equal to one of values X(.(a h bj) with equal 

probabilities (bj— o,+ l) _1 . The matrix .<4=||a J *|| K;ri v, iV=max (bj+lj—l), is defined in a following 


way. a jk is equal to the probability that the position a jk would be busy when the starting time of 


j-th job is randomly chosen within its range. Let b k = ^ ^m k=l, N. We now introduce the entropy 
of distribution {b k °}i N obtained from {b k } after norming: 

(3.1) E N =-Jt,h° log b k ° 

This value supplies us with information about the possible minimal number of machines in 
the fleet. Thus the small value of E N will occur when many jobs are concentrated within the same 
time period. That would lead to a large number of husy machines. 

Now we give a local criterion for choosing a position for a given j-th job. 

Let the starting time fy, lie between a ; and bf. a^tj^bj. The value fy, is chosen and the values 
a jk are defined in the following way: a ; vt=l if k=t Ji , fy,+l, . . • , tj x -\-lj— 1 and a jk =0 otherwise. 
Now the value of E(ji) is calculated according to (3.1). The time tj* is assumed "the best" if 

(3.2) E(j*)&EtidtordLt J rf<h,h i ] 

3.3 Determination of Priority 

Let us assume that the exact starting times for jobs with numbers jeS + are already chosen. 
Every job j, jjS + is randomly placed within its range. A number^, j jS + is fixed and the difference 
iH(Jo) is calculated: 

M ~ M ~ 

(3.3) Ai7(i )=max ^j «#— max ^Za jk {l—hj U )- 

l<k<Nj=l l<k<N j = l 

>jio is a Kroneker symbol. The value AH(j ) has the following meaning: it is the additional number 
)f machines needed for j -th job when this job will be added to other jobs. The procedure of random 
-ssignment of jobs is repeated K times and the average value of AH(j ) is calculated for every job 
<ofS + . We shall give the highest priority to the job number j* which satisfies the following inequality : 

3-4) AH(j*)>AH(j ),j eS+ 



a. According to the description given in section 3.3, the job j with highest priority is found. 

b. According to local criterion, (see section 3.2) the optimal position for the job j is found in 
the j -th row. 

c. The job j is added to the set S + of scheduled job and then phase a is repeated. 

The heuristic justifying of this algorithm is as follows. The highest priorities will be given to the 
most "difficult" jobs wh'ch demand for themselves an additional machine with higher probability. 
The best position for a given job is the position which provides maximal uniformity according to the 
local criterion based on entropy. The algorithm is also of a "forward pass" type. 

3.4 Some Experimental Data 

A series of schedules were constructed on the computer according to Algorithm 2. They demon- 
strated that this algorithm either provides an optimal solution or gives a schedule having a number 
of machines in the fleet that has one machine more than in an optimal solution. A typical example 
for a schedule with 32 jobs in the matrix 32 X 16 is presented in Table I. The Algorithm 2 found an 
optimal solution given in this table which required seven machines. 

It is interesting to compare this result with three alternative approaches: the schedule is 
performed absolutely randomly (the priority and the position for every job is chosen randomly) ; 
the priority is random, the local criterion corresponds to Algorithm 2; the priority was found 
according to Algorithm 2, and the position for the job was chosen randomly within its ranges. The 
results were the following: 

(1) The best random schedule among 10,000 done on the computer had 11 machines in the 
rid there were only forty such schedules found. 

(2) The random priority in the presence . of local optimization gave a schedule requiring 
tight machines. 

(3) The random local policy combined with the priority rule given in the first part of Algo- 
rithm 2 gave a schedule requiring 13 machines. 


In constructing large-size schedules, the most important and realistic criterion would be a 
maximal number of jobs performed by a given set of machines and within a given period of time in 
the presence of some special restrictions. If an algorithm similar to one described in Section 3 is 
used, then it can happen that at some stage of scheduling a position cannot be found for a certain 
job. Since the algorithm uses nondeterministic decisions, it seems expedient to estimate its "quality" 
using the mathematical expectation of the number of jobs disposed in the schedule. That could be 
done, for example, in the terms of dependent trials when a success in an ordinary trial depends on 
the number of successful preceding trials. An attempt to evaluate this approach was made in [2]. 


Table 1. The schedule for 32 jobs. 


































— i 










■— ■ 














■— ■ 




































■— ■ 








— ■ 

— — 



■— ■ 





























— i 


























■— ■ 































— i 












— ■ 
































— ■ 














+ . 












- + 




Ht — total number of machines occupied in the k-th time period. 
H — time not available for the job. 
■ — job performed. 



[1] Ford, L. R. and D. R. Fulkerson, Flows in Networks (Princeton University Press, Princeton, 

New Jersey, 1962). 
[2] Gertsbach, I. "On a Problem of Choosing the Order of Dependent Trials, Theory of Probability 

and its Applications," 17, 709-712 (1972). 
[3] Gertsbach, I., M. Maksim, et al., "Heuristic Method for Constructing the Aviation Schedule," 

In the Coll. Automation in Machinery, Academy of Sciences, (Moscow, 1969) (in Russian). 
[4] Gertsbach, I., V. Venevcev, et al., "Central Avia-schedule as a Part of Air Traffic Control 

System," Proceedings of the First International Traffic Control Session, Sec. 6, 5-26 (Ver- 
sailles, 1970). 
[5] Johnson, S., "Optimal Two and Three-Stage Production Schedules with Setup Times Included," 

Naval Research Logistics Quarterly, 1, 61-68 (1954). 
[6] Korbut, A. and Yu. Finkelstein, Discrete Programming (Nauka, Moscow, 1969) (in Russian). 
[7] Marcus, M. and H. Mine, A Survey of Matrix Theory and Matrix Inequalities (Allyn and Bacon, 

Boston, 1964). 





|u the .\ 


James G. Taylor and Craig Comstock 

Naval Postgraduate School 
Monterey, California 


This paper develops a mathematical theory for predicting force annihilation 
from initial conditions without explicitly computing force-level trajectories for 
deterministic Lanchester-type "square-law" attrition equations for combat between 
two homogeneous forces with temporal variations in fire effectivenesses (as ex- 
pressed by the Lanchester attrition-rate coefficients). It introduces a canonical 
auxiliary parity-condition problem for the determination of a single parity- 
condition parameter ("the enemy force equivalent of a friendly force of unit 
strength") and new exponential-like general Lanchester functions. Prediction of 
force annihilation within a fixed finite time would involve the use of tabulations 
of the quotient of two Lanchester functions. These force-annihilation results pro- 
vide further information on the mathematical properties of hyperbolic-like general 
Lanchester functions: in particular, the parity-condition parameter is related to 
the range of the quotient of two such hyperbolic-like general Lanchester functions. 
Different parity-condition parameter results and different new exponential-like 
general Lanchester functions arise from different mathematical forms for the 
attrition-rate coefficients. This theory is applied to general power attrition-rate 
coefficients: exact force-annihilation results are obtained when the so-called offset 
parameter is equal to zero; while upper and lower bounds for the parity-condition 
parameter are obtained when the offset parameter is positive. 


Deterministic Lanchester-type equations of warfare (see Tajdor and Brown [25], Weiss [27]) 
play an important role in military operations research for developing insights into the dynamics of 
combat (see, for example, Bonder and Farrell [4], or Bonder and Honig [5]), even though combat 
between two opposing military forces is a far more complex random process. The classic Lanchester 
theory of combat (see Dolansky [9]) considered constant attrition-rate coefficients. New operations 
research techniques for forecasting temporal variations in fire effectiveness (caused by, for example, 
changes in force separation, combatant postures, target acquisition rates, firing rates, etc.) have 
generated interest in variable-coefficient combat formulations. Unfortunately, the resultant differ- 
ential equations are not well studied. 

*This research was supported by the Office of Naval Research as part of the Foundation Research Program 
at the Naval Postgraduate School. 



In this paper we present a mathematical theory for predicting battle outcome from initial 
conditions without explicitly computing force-level trajectories for variable coefficient Lanchester- 
type equations of modern warfare for combat between two homogeneous forcesf. The deter- 
mination of conditions on initial values that predict force annihilation (in the sense of necessary 
and/or sufficient conditions) in such Lanchester-type combat leads to some new mathematical 
problems in the theory of ordinary differential equations. This force annihilation problem may be 
viewed as either a problem of determining the asjrmptotic behavior of the solution (depending on 
given initial conditions) or a problem of determining the range of the quotient of two linearly inde- 
pendent solutions to, for example, the X force-level equation [25] J. In either case, the classic 
ordinary differential equation theories (see, for example, Hille [10], Ince [11], and Olver [17]) are 
inadequate to supply all the answers sought. We show that questions of force annihilation can be 
reduced to the study of certain "exponential-like" Lanchester functions and may be simply an- 
swered by examining certain inequalities involving the initial conditions and possibly by consulting 
tabulations of new special functions that are suggested here. Our general results apply to a wide 
class of attrition-rate coefficients (namely, those that yield continuous force-level trajectories). 

Thus, in this paper we provide a general theoretical framework for determining force annihila- 
tion without explicitly computing force-level trajectories for variable-coefficient Lanchester-type 
equations of modern warfare. Other modes of battle termination are briefly discussed. We introduce 
a canonical auxiliary parity-condition problem for such determinations. New exponential-like 
general Lanchester functions arise from the solution to this problem, and tabulations of these would 
facilitate force-annihilation prediction. Different mathematical forms for attrition-rate coefficients 
lead to different auxiliary parity-condition problems. Our theory is applied to general power 
attrition-rate coefficients - exact force-annihilation results are obtained for cases of "no offset" 
(modelling, for example, weapon systems with the same maximum effective range) ; and although 
qualitative results are obtained, future computational work is required for quantitative results for 
cases of "offset" (modelling, for example, weapon systems with different maximum effective ranges). 


F. W. Lanchester [13] hypothesized in 1914 that combat between two military forces could be 
modelled by 

(1) dx/dt=—ay, dy/dt=—bx, 

fBonder and Honig [5] point out, however, that force annihilation may not always be the best criterion for 
evaluating military operations. See pp. 192-242 of Bonder and Farrell [4] for a detailed Lanchester-type analysis of 
an attack scenario for which other "end of battle conditions" play the major role in the evaluation process. Never 
theless, it is of interest to be able to easily predict the occurrence of force annihilation. Such results are not only at 
intrinsic interest but also are useful in the optimization of combat dynamics (see Taylor [21, 23]). 

^Previous work by Bonder and Farrell [4], Taylor [22], and Taylor and Brown [25] shows that new transcea 
dental functions arise even in the case of linear attrition-rate coefficients reflecting weapon systems with different 
effective ranges (i.e. the coefficients (12) with ix—v — \ and ^4>0). For example, the differential equation (52) could 
not be found among the 445 linear second order equations tabulated in Kamke [12]. Moreover, even when one can 
express a solution in terms of previously known transcendents, the appropriate tabulations (see, for example, Abram 
owitz and Stegun [1]) may not exist (see Section 5 of [25]). As the equations of mathematical physics have provided 
interest in many previously studied transcendents, variable-coefficient Lanchester-type equations provide interest 
in new transcendents. 



with initial conditions 

(2) x (t=0)=x Q> y(t=0)=y , 

where £=0 denotes the time at which the battle begins, x(t) and y(t) denote the numbers of X and 

Y at time t, and a and 6 are nonnegative constants which are today called Lanchester attrition-rate 
coefficients and represent each side's fire effectiveness. Lanchester (see McCloskey [16] for his in- 
fluence on operations research) considered this model (1) in order to provide insight into the dy- 
namics of combat under "modern conditions" and justify the principle of concentration.! We will 
accordingly refer to (1) as Lanchester' s equations of modern warfare. Various sets of physical cir- 
cumstances have been hypothesized to yield them: for example, (a) both sides use aimed fire and 
target acquisition times are constant (see Weiss [27]), or (b) both sides use area fire and a constant 
density defense (see Brackney [6]). 

From (1) Lanchester deduced his classic square law 

(3) b(x 2 -x>(t))=a(y *-y 2 (t)). 

Consider now a battle terminated by either force level reaching a given "breakpoint" :J for example, 

Y wins when x f =x(t f )=x B p but y^>y B p, where t f , x f , y f denote final values and x B p denotes A''s 
breakpoint. Let us express x B p as a fraction of X's initial strength j x BP , e.i. x BP =j x BP x , and similarly 
for y B p . It follows from (3) that 

(4) Y will win< 

xo, l a{l-(f Y BP ) 2 } 

' s/o^\ b{i-(f x BP y}~ 

We will refer to a battle that continues until the annihilation of one side or the other (i.e. x B p—y B p=0) 
as a fight-to-the-jinish. In this case (4) becomes 


Y will win fight-to-the-finish« — >— <-»/t 

Vo \ b 

Unfortunately, no relationship similar to (3) holds in general for variable attrition-rate co- 
efficients. Hence, for such cases we will have to use a different approach to develop victory-prediction 
conditions analogous to (4) or its special case (5). Accordingly, we observe that (4) may also be 
obtained from the time history of the X force level 

(6) x(t) = {(x — y^ajb) exp (Jabt) + (x +y ^a/b) exp (— Vo6<)}/2, 
via determining the time for X to reach his breakpoint (i.e. x(t=t x BP )=x BP ) 

(7) t x BP =(l/JaJ) In ({-x BP +^/x BP 2 +y 2 a/b-x '}l{y ^a7b-x }), 

fThe influential 19th-century German military philosopher, Carl von Clausewitz (1780-1831), stated in his 
classic work On War (Vom Kriege) (see p. 276 of [7]), "The best Strategy is always be to very strong, first generally 
then at the decisive point. . . . There is no more imperative and no simpler law for Strategy than to keep ihe forces 

JAs pointed out in reference [26] the entire topic of modelling battle termination is a problem area on contem- 
porary defense planning studies, and there is far from universal agreement as to even which variables should be 
•taken as the significant variables for modeling this complex process. For further references see Taylor [23]. 


and requiring t x BP< Ct Y BP . The key result for obtaining (7) is that one of the two linearly independent 
solutions to the X force-level equation d 2 xldt 2 —abx=0 is the reciprocal of the other. For a fight-to- 
the-finish, (7) becomes 

(8) t x a ={l/(2^)} In ({y Ja7b+x }/{yo^-x }), 

where t x denotes the time to annihilate the X force. We observe that (5) is an immediate consequence 
of (8) J. 

In many applications (see Section 3 below), one is interested in whether the battle will be 
terminated within a given time t v In this case x <^y \a/b is a necessary condition for X to be 
annihilated and annihilation occurs when t x "<t g . Thus, determination of whether force annihilation 
will occur within a given time involves consulting a tabulation of a transcendental function, here 
the natural logarithm (see also (10) below) . Similar results hold for other fixed force-level breakpoints. 

The time history of the X force level may also be written as 

(9) x(t)==x Q cosh -yjab t—y y[afb sinh -yfab t. 

Taylor and Brown [25] take (9) as their point of departure for a mathematical theory for solving 
variable-coefficient formulations. (7) does not follow directly from (9), but (5) does via x(t—t x a )=0 

(10) tx a =(l/^/ab) tanh-^Xo/W^). 

since the range of the hyperbolic tangent is [0, 1] for nonnegative arguments. 

The purpose of this paper is to generalize the above to the general case of variable attrition- 
rate coefficients. We use (6) as our point of departure and base our development on the observation 
that (5) follows directly from (6) , since the second term in brackets is always positive and goes to 
zero as t— >+<». (We will ignore the physical impossibility of negative force levels in developing 
results like (5)). Thus, by (6) 

Jim x ( t ) = - oo ^~* Xo < yja/b y , 


whence follows (5) f. 


The pioneering work of S. Bonder [4, 5] on methodology for the evaluation of military systems 
(in particular, mobile systems such as tanks) has generated interest in variable-coefficient 

tit is obvious that at most one of x(t) and y(t) can vanish. In this case we have what the mathematician calls 
a nonoscilliatory solution to (1) (see p. 373 of Hille [10]). Consequently, if the time to annihilate X (i.e. t x a as giver i 
by (8)) is well defined, then we must have y(0>0 for all t >0. Expression (5) is obtained from (8) by requiring thai 
the argument of the natural logarithm in (8) is positive, i.e. by requiring that tx" be well defined. The nonoscillatioi 
of all solutions to (11) (a special case of which is (1)) is an immediate consequence of the identity 

'(t)y(t)=x y — J {a(s)y 2 (s)+b(s)x 2 (s)}ds, 

which follows from multiplying the first equation of (11) by y, the second by x, adding, and integrating the resul 
between and t. From the above identity, it is clear that if x(t) ever becomes zero, then y(t) >0 for all t>0. 


Lanchester-type equations and has led to improved operations research techniques for the predic- 
tion of such coefficients (see Bonder [2, 3] ; background and further references are given in Taylor 
and Brown [25]). Thus, we consider 

(11) dx/dt=—a(t)y, dy/dt=—b(t)x, 

where a(t) and b(t) denote time-dependent Lanchester attrition-rate coefficients. 

These coefficients depend on such variables as firing doctrine, firing rate, rate of target acquisition, 
force separation, tactical posture of targets, etc. (see [4]). Without loss of generality, we may take 
a(t)=k a g{t) and b(t) = k b h(t), where g(t) and h(t) denote time-varying factors such that a(t)/b(t) 
=Jc a /kt,= constant for g(t)=h(t). We will also refer to (11) as the equations for a square-law attrition 
process, since an "instaneous" square law holds even when a(t)/b(t) is not constant (see Taylor and 
Parry [26]; also references [21, 22, and 24]). 

A large class of combat situations of interest can be modeled with the following attrition-rate 
coefficients (see [4]). 

(12) a(t)=k a (t+C) fi and b(t)=k b (t + C+A) v , 

where A, C>0. We will refer to these coefficients as general power attrition-rate coefficients. The 
modeling roles of A and C are discussed in Taylor and Brown [25]. We will refer to C as the starting 
parameter, since it allows us to model (with n, v>0) battles that begin within the minimum of the 
maximum effective ranges of the two systems. We will refer to A as the offset parameter, since it 
allows us to model (again, with n, v>0) battles between weapon systems with different ranges 
(i.e. opposing weapons whose fire effectiveness is "offset")- For example, let us consider Bonder's 
[4] constant-speed attack on a static defensive position (see also [22, 25]). Then we have 

(13) dx/dt=-a(r)y, dy/dt=—l3(r)x, 

where r(t)=R — vt denotes the distance (range) between the two opposing forces, R denotes the 
battle's opening range, v^>0 denotes the constant attack speed, 

(14) a(r)- 

iorr>R a , 

ao(l-r/R a )» !orO<r<R a> 

M>0, and R a denotes the maximum effective range of F's weapon system. Similarly for /3(r), with 
exponent v >0. In (14) the parameter m allows us to model the range dependence of F's fire effective- 
ness (see Figure 1). The offset and starting parameters are given by 

(15) A=(R -R a )/v, and C=(R a -R )/v, 

and the assumption A, C>0 implies that Rp>R a >R . From considering (15) and Figure 2, the 
, reader should have no trouble understanding our terminolog} r for A and C. In this model the de- 

flt is clear that x(t)— > — =° implies that there exists a finite t x a such that x(t x a ) = 0. By the nonoscillation of 
solutions to (11) (a special case of which is (1)) it follows that ?/(0>0 for all t >0 so that Y must win a fight-to-the- 
finish in finite time (see footnote t p. 352). 





- \v 


\\ \2 N. 
\\3 \. \. 


i r C:: fefe 

X. \ R 

RANGE ft (meters) 

P'igure 1. Dependence of the attrition-rate coefficient a(r) on the exponent n for constant 
maximum effective range of the weapon system and constant kill capability at zero ran ge. 
(The maximum effective range of the system is denoted i2 a =2000 meters; a(r = 0) =ao = 
0.6X casualties/ (unit time X number of Y units) denotes the Y force weapon system 
kill rate at zero force separation (range). The opening range of battle is denoted as Ro — 
1250 meters and (as shown) Ro<^R a .) 



V-— a(K) 




1 1 



1 1 




1500 2000 

RANGE '( (meters) 




Figure 2. Explanation of starting parameter C and offset parameter A for power attrition- 
rate coefficients modelling constant-speed attack. (The maximum effective ranges of 
the two weapon systems are denoted as R a and R$. The opening range of battle is de- 
noted as Ro and (as shown) J?o<minimum (R a , JF^). The starting parameter is given 
by C=(R a — Ro)/v. The offset parameter is given by A = (Rp—R a )/v.) 

fensive position is overrun by the attackers (i.e. zero force separation is reached) at time f g =7? A 
and this leads to interest in predicting battle termination within a given time t g (see Section 2 above) 
Almost all previous work on the variable-coefficient equations (11) has developed infinit 
series solutions for force-level trajectories or represented these by tabulated functions (see [25], ii 
particular Section 3). Relatively little attention has been given to determining the qualitativ 
behavior of solutions to (11) (such as prediction of battle outcome) without explicitly computin 



battle trajectories. Bonder and Farrell [4]f, however, have considered force annihilation within a 
given time. Using comparison techniques from the theory of ordinary differential equations (see, 
for example, Coddington and Levinson [8]), they obtained a rather strong sufficient condition for 
the special case of the model (13) with n=v=l and A>0. 


Motivated by the constant-coefficient resultsff, we introduce the exponential-like general 
Lanchester junctions E x + , E x ~, E Y + , and E Y ~, defined by 

dE x + /dt=jk b /k a a(t)E Y + with E x + (t=t ) = l/Q, 

dE Y +/dt=^kJk b b(t)E x + with E Y +(t=t )=l, 



\dE x -/dt= — ^k b /k a a(t) E Y ~ with E x ~ (t=t Q ) = l, 


[dE Y -/dt=-^Jk a /k b b(t)E x - with E r ~(t=t ) = Q, 

where Q>0 is to be determined so that E x ~ and E Y ~ have certain desired properties. We assume 
that a{t) and b{t) are defined, positive, and continuous for U<Ct < ^ J r °° with t <0. Further restric- 
tions on a{t) and b(t) are given by Conditions (A) and (B) below. 

We want the solutions E x + and E Y + to (16) and E x ~ and E Y ~ to (17) to satisfy the given 
initial conditions, to be continuous, and to act like exponentials} for general a(t) and b(t). Accord- 
ingly, we must further restrict a{t) and b{t) slightly. We therefore assume that the following two 
conditions hold. 

CONDITION (A): C a(s)ds and P b(s)ds are both bounded for all finite t>t . 
CONDITION (B): lim [' a{s)ds = + <*>, and lim f b(s)ds = + ™. 

We then have {see Hille [10] or Lee and Markus [15]) 

THEOREM 1 : Condition (A) is a necessary and sufficient condition for (16) and (17) to have a 
continuous solution for all finite t>t . Also, 

THEOREM 2: Condition (B) implies that both E x + and E Y + are unbound, e.g. 

fBonder and Farrell [4] take range (i.e. force separation) to be the independent variable in their work, while 
Taylor [22] and Taylor and Brown [25] take time as we have done in this paper, r 
tfRecalling the constant coefficient result (6), we consider 

d{exp (y/abt)}/dt=a^b/a exp (•v / o60 = ^V«/^ exp (-yjabt), 

d{exp {—4abt)}ldt=—ayfbfaex^ (—yfabt) = — b^a/b exp (—-Jabt), 

to obtain motivation for (16) and (17). 

tWe want E x ~ to behave like a decaying exponential and E x + like an increasing one. 


-PROOF: The theorem follows from oberving that, for example, dE x + /dt>-yJk b /k a a(t). 

Besides satisfying (16) and (17), E x + and E x ~ are linearly independent solutions to the X 
force-level equation 
, im d 2 x f l da) dx ,**,* n 

(18) W*-\W)MTt- a{t)h{t)x = Q - 

In the initial conditions of (16) and (17), t =meLx(t x , t Y ), where t x denotes the largest finite 
singular point on the i-axis for (18) (see Taylor and Brown [25]; also p. 69 of Ince [ll])tf. Since 
E x + and E x ~ are linearly independent, we may use them to construct the general solution to (18), 
It follows that the solution to (11) with initial conditions (2) is given by 

(19) x(0 = {[x ^y-(^=0)-V^?/oSx-(«=0)]£' x + (0 + bo^r + (<=0) + V^^2/o^ + (^=0)]^ !r -(0}/2 ) 

y(0 = {[yo^-(^=0)-V^Xo^-(«=0)]S y + (i)+[y Sx + (^0) + VSxo^r + («=0)]S r -(0}/2, 
where we have made use of the easily verifiable fact that (see [25]) 

(20) E x + (t)E Y -(t)+E x -(t)Ey+(t)=2 V«. 

Let us now consider how to choose the general Lanchester functions E x ~ and E x + so that 
they play the roles of a decaying exponential and an increasing one. Force-annihilation-prediction 
conditions may then be obtained from (19) by inspection (again, see footnote, p. 353). We recall 
the constant-coefficient result (6) and its consequence (5) , obtained using 

lim exp (—\labt)=0 and exp (±-y/abt)^>0. 

Clearly, E x + (t)y>0 for all t>t when Q>0. We have already shown that Condition (B) implies 
that E x + (t), which, satisfies (16), grows without bound just as an increasing exponential does 
We will now show how to choose E x ~ so that it corresponds to a decaying exponential. Simila: 
statements hold for E Y ~~ and E Y + . Considering (17), we see that we should choose E x ~ and E Y 
to remain positive for all t so that by (17) they continuously decrease. Furthermore, we will b< 
able to specify such behavior for E x ~ and E Y ~ by our selection of the parameter Q in the initia 
conditions for (17). 

The solution E x ~(t), E Y ~(t) to (17) depends continuously on the parameter Q of the initia 
conditions (see, for example, Hille [10]). We denote this dependence by E x ~(t; Q), E Y ~(t; Q). Le 
Q*=Q*(a(f), b(t)) denote the unique (see [10]) value of Q such that 

(21) E x ~(t; Q=Q*), E Y ~(t; Q=Q*)>0 for all finite t>t . 

ttWe take a(t) and 6(0 to be analytic in the (finite) complex plane except for a finite number of singularities or 
the real axis. The singularities of (18) then occur at the zeros and singularities of a(t) and at the singularities o 
a(t)b(t). Consequently, t belongs to the set of points consisting of the zeros and singularities of a(t) and b(t) (set 
Taylor and Brown [25]) . We define t this way in order to reduce the number of tabulations of exponential genera 
Lanchester functions required for force-annihilation analyses (see Theorem 4) . For example, for the general powe 
attrition-rate coefficients (12) we have t = — C, and for fixed A >0, n, and v only a single tabulation of e x ~ and ey~ i 
required to handle all problems with C>0 (see, for example, Theorem 5). 


Using arguments similar to those given by Hille (see pp. 437-439 of [10]), one can show thatf 

(22) Urn E x -(t; Q*) = lim E y -(t; Q*)=0. 

It is intuitively obvious that such a Q* exists, and we will prove its existence in some particular 
cases below. As we shall see, knowledge of Q* provides valuable information about the qualitative 
behavior of force-level trajectories for the Lanchester-type equations (11). Let us refer to the 
problem of determining Q* such that (22) holds as the auxiliary parity-condition problem. Unless 
explicitly stated otherwise, for convenience we will denote, for example, E x + (t; Q*) &sE x + (t). 
COMMENT 1 : For a constant ratio of attrition-rate coefficients, i.e. 

(23) a(t)=k a h(t), and b(t)=kjt(t), 

where h(t) denotes the common time-varying factor of the two coefficients, it readily follows from 
the results given in references [4] and [20] that Q*=l and 

(24) E x + (t)=E Y + (t)=exp{Ht)}, and E x -(t)=~E Y -(t)=exp{-+(t)}, 


J to 

COMMENT 2: By Theorem 1 of Tay! ad Brown [25], the solution (19) simplifies to the 
form of (6) only if (23) holds. 

The above exponential-like general Lanchester functions may be related to Taylor and Brown's 
[25] hyperbolic-like general Lanchester functions. Let 

(25) C x {t)=x l (t), S x (t)=x 2 (t), C Y \?)=y l (t), and S Y (t)=y 2 (t), 

where x u x 2 y u and y 2 denote the general Lanchester functions introduced by Taylor and Brown. 
Then, similar to the well-known relationships beta een the hyperbolic and exponential functions, 
we have 

C x (t) = lQ*E x +(t)+E x -(t)}P, S x (t) = {Q*E x +(t)-E x -(t)}/(2Q*), 


Cy(t) = {Q*E Y ^(t)-i-E Y -(t)} i l (2T)- S r (t) = {Q*Ey+(t)-Ey-(t)}/2. 

The determination of Q* will be slightly simplified for general power attrition-rate coefficients 
(12) by considering a modified auxiliary parity-condition problem. For this purpose we introduce 
the new independent variable 

(27) s=K^h/k a Ca(v)dv, 

and define s Q =s(t=0)>0 for t <0. K is an, at pr ant, undetermined parameter. It will be chosen 
so that a more convenient canonical .system of differential equations arises in the modified auxiliary 

tFrom (17) it is clear that E x ~ and E Y ~ are continuously decreasing when (21) holds. 



parity-condition problem. By Condition (A), the transformation (27) is well defined for t>t^ It 
has an inverse t(s), since a(t)>0\/t>t . Letting 

(28) e x +(s)=KE x +(t(s)), e Y +(s) = E Y +(t(s)), e x -(s) = E x -(t(s)), and e Y -(s) = E Y -(t(s))/K, 
the substitution (27) transforms (16) through (18) into 


\de x + /ds=e Y + 

with e x + (s=Q) = l/Z, 

[de Y + /ds=I(s)e x + 

with e r +(s=0) = l, 


\de x ~/ds=—e Y ~ 
\de Y ~/ds=—I(s)e x ~ 

with e x ~(s—0)=l, 
with e Y ~(s=0)—Z, 



d 2 x/ds 2 - 


where for any Q 







' b }/{a(t)/k a })IK 2 , 

is the invariant of the normal form (31) (see p. 119 of Kamke [12]) and t—t(s) by (27). The pa 
rameter K will be chosen to simplify the form of I(s). In our later work the equation (31) will be 
easier to analyze than (18). 

We will refer to the problem of determining Z*=Z*(a(t), b(t)) such that 

(34) e x ~(s; Z=Z*), e Y ~(s; Z=Z*)>0 for all finite s>0, 

as the modified auxiliary parity-condition problem. By (32) we then have that Q*=KZ*. 

We also observe that the solution to (31) which satisfies (2)- may be expressed in terms ol 
these exponential-like general Lanchester functions for s>s as 

(35) x(s) = {[x Ke Y -(.s=s ) — s[kjk b y e x -(s=s )}e x + (s)/K 

+[x Ke Y + (s=s ) + ^JKih y e x + (s =s ) ]e x ~ (s)/K}/2, 
and similarly 

(36) y (s) = { [y e x -(s=s )— VMr a x Ke Y -(s=s )]e Y + (s) 

+ [y ex + (* =«o) + y/h/ka x Ke Y + (s =«„) ]e Y ~ («) } /2, 
where by (20) and (28) 

(37) e x +(s)e Y -(s)+e x -(s)e Y + (s)=2 \fs. 

..From our choice of Q* such that (21) and (22) hold, we can immediately infer the behavioi 
of the solution (19) to (11) as <-»+ <» and similarly for y{t). Thus, we have 

THEOREM 3: lim x{t) = - » if and pnly if 

Xo!y <jK/hE x -(t=0; Q*)/E Y -(t=0; Q*). 
Equivalently, we may state (see footnote, p. 353 again) 


THEOREM 4: Consider combat between two homogeneous forces described by (11). Assume 
that (11) applies for all time and that Y "wins" when x(t f )=0 with y(t f )^>0. Then, Y will win if 

and only if 

x /y <^kJk l ,E x -(t=0; Q*)/E y -(t=0; Q*). 

Thus, we see that tabulations of such new exponential-like general Lan Chester functions E x ~(t; Q*) 
and Ey~(<; Q*) would facilitate force-annihilation prediction. By our choice of t only one such tabu- 
lation is necessary for given attrition-rate coefficients a(t) and b(t) (see Note 9). Alternatively, 
we may express the force-annihilation condition of Theorem 4 in terms of Taylor and Brown's 
[25] hyperbolic-like general Lanchester functions (see equations (25) and (26)). We have then 

THEOREM 4' : Consider combat between two homogeneous forces described by (11). Assume 
that (11) applies for all time and that Y "wins" when x(t f )=0 with y{t f )^>0. Then, Y will win if 
and only if 

x /y <JK/k b {C x (t=0)-Q*S x (t=0)}/{Q*C Y (t=0)-Sy(t=0)}. 

As an immediate corollary to Theorem 4 we have 

COROLLARY 4.1 : For * =0, Y will win if and only if 

Xq ^J_ jka 


One shortcoming of our above development is that Theorem 4 and its corollaries are basically 
existence theorems for the annihilation of one side by the other at some unknown future point 
in time. As the enemy initial force level decreases towards parity (i.e. equality holding in (38)), 
the time required to annihilate a force becomes larger and larger. There is, in fact, no limit to how 
large it may become. Moreover, there is a large class of tactically significant battles (see Bonder's [4] 
constant-speed attack on a static defensive position discussed in Section 3 above) which has a built-in 
time limit, denoted as t g . Hence, it would be desirable to have a method for determining from 
initial conditions (without explicitly computing the entire force-level trajectories) whether or not 
force annihilation will occur within a given finite time t g . For our general model (11), Theorem 4 
tells us that 

(38) x /y < Vtt E x -(t=0; Q*)/E y -(t=0; Q*), 

is a necessary condition for x(t x a ) = with t x a <t g , where t x a denotes the time at which the X force 
is annihilated. Motivated by the well-known constant-coefficient result (8), one intuitively sees 
that determining whether or not t x a <t„ will require the appropriate tabulations of new transcendents 
(i.e. new functions). Different such functions arise from different fundamental systems chosen 
to construct the solution to, for example, the X force-level equation (18). Let us now investigate 
which fundamental system of solutions is the most useful. 

We begin our investigation by outlining for the new exponential-like general Lanchester func- 
tions how to determine from initial conditions (without explicitly computing the force-level tra- 
jectories) whether or not force annihilation will occur within a given finite time t g . Looking at (19) 


and setting x(t) = 0, we see that we must solve for E x + (t)IE x ~{t)=t){t)\. For any other fundamental 
system (i.e. pair) of solutions, we must still solve for such a quotient (cf. the constant-coefficient 
results (8) and (10)). Thus, our force-annihilation determination requires the use of tabulations 
of the quotient of two linearly independent solutions to, for example, the X force-level equation. 
Our question is now which quotient is the most useful (i.e. which fundamental system yields the 
most useful quotient). We will show that the quotient of two exponential-like general Lan Chester 
functions is not numerically satisfactory for such determinations, whereas the quotient of two 
hyperbolic-like general Lanchester functions is. 

Motivated by the result for a constant ratio of attrition-rate coefficients (23) that 

ij(0=exp {2*(0}, 

we use the following notation for the r)(t) of the X force-level equation 

(39) E 2X +(t)=E x +(t)/E x ~(t), 

with E 2Y + {t) being similarly defined. Assuming that (38) holds, we see from (19) that if x(t=t x a )=0, 

E 2x + (t x a ) = {JkJk b y E x +(t=0; Q*)+x E Y +(t=0; Q*)}/{JkJk b y E x -(t=0; Q*)-x E Y -(t=0; Q*)}. 

We will prove below that E 2x + (t)=E 2x + (t; Q*) is a strictly increasing function of t with initial valu e 
l/Q* at t = t . Consequently, the inverse function E 2 + X ~ l (£) is well defined V£ € [l/Q*i+°°)> an d in 
this case the time to annihilate X is given by 


t x a =E 2 + x - i ({^kJk b y Q E x + (t=0; Q*)+x E Y + (t=0; Qr)}/{^/kJk b y E x -(t=0;(^)-x Q Ey-(t=0; Q*)}). 

However, in general for Xbp^O we have been unable to develop an analogous resultj. 

We now show that E 2X + (t) (defined by (39)) is a strictly increasing function. We readily 
compute using (16), (17), and (20), that 

(41) dE 2x +/dt=2^h/k a a(t)/{E x -(t)} 2 withE 2x +(t=t ) = l/Q*, 

whence follows the monotonicity. Similar results hold for the modified exponential-like general 
Lanchester functions defined by (29) and (30). 

tit is well known (see, for example, pp. 647-650 of Hille [10] or p. 120 of Kamke [12]) that the quotient of two 
linearly independent solutions to (31), which is equivalent to (18), satisfies Schwarz's (third order) differential 
equation (see Schwarz [18]) 


where r) denotes the quotient of two linearly independent solutions to (31) [i.e. r)=e x + (s) /e x ~ (s)], 

v {u,«W"-(3/WW 

denotes the Schwarzian derivative of?; with respect to s, rj' denotes drj/ds, etc., and I(s) denotes the invariant of the 
normal form (31). For numerical computation of ?j, however, there is no advantage to consider this third order 
equation, and it is preferable to calculate ?j from, for example, y(s)—ex + (s)/e x ~(s), where e x + and e x ~ also satisfy 
(29) and (30). 

% Recalling our development of (7), we see that, except for the special case in which (23) holds, this same ap- 
proach fails to yield the time for X to reach his breakpoint (assumed to be positive) 

(i.e. t x BP such that x(t=t x BP )=x BP >0) . 

Consequently, it is apparently impossible to predict in the manner described in the main text the outcome of a 
fixed force-level breakpoint battle with positive breakpoints unless (23) holds. 


Thus, (I) determination of Q*=Q*(a(f), 6(0), and (II) tabulation of E 2X + (t) and E 2Y + (t) would 
allow one to determine whether or not force annihilation occurs in such battles with finite time 
limit without explicitly computing the entire force-level trajectories (see Figures 3, 4, and 5 of 
[25]). Unfortunately, there is a serious drawback to considering E 2X + (t) and E 2Y + (t): accurate 
tabulations are difficult to generate (in fact, they are essentially impossible for large values of /) , 
since both functions are basically increasing exponentials so that any error in their initial value 
l/<2* (which, in general, can be only approximately numerically determined) becomes tremendously 
magnified over time. 

We may develop numerically satisfactory functions for prediction of force annihilation within 
a given finite time, however, by considering the hyperbolic-like general Lanchester functions of 
Taylor and Brown [25]. Let us therefore define 

(42) T x (t)=S x (t)/C x (t). 

The functions T x (t) and T Y (t) are analogous to the hyperbolic tangent, to which they reduce 
for a constant ratio of attrition-rate coefficients. Considering (25) and equation (16) of [25], we 
see that, for example, T x {t) does not depend on Q*, since S x (t) and C x (t) do not. Thus, T x {t) 
and T Y {t) are numerically suitable for determining whether or not force annihilation will occur 
within a given finite time. 

Using the results given in Table I of Taylor and Brown [25], we readily compute that 

(43) dT x /dt=Jkjk n a(t)/{C x (t)} 2 withT x (t=t )=0. 

Hence, T x (t) is a strictly increasing function, and its inverse T x ~ l is well defined. Let us now es- 
tablish an upper bound for T x (t). By (26) and (42), we have 

T x (t) = (l/Q*){Q*E x + (t)-E x -(t)}/{Q*E x +(t) + E x -(t)}, 
whence it follows that for t>t 

(44) 0<T x (t)<l/Q*, with lim 2^(<) = 1/Q*. 

Thus, our current investigation has yielded important information about the asymptotic behavior 
of hyperbolic-like general Lanchester functions. 

To determine t x a such that x(t x a )=0, we write the solution to (18) which satisfies the initial 
conditions (2) as [25] 

x(t)=x Q {Cr(t=0)C x (t)-Sy(t=0)S x (t)}-y yJkJh{C x (t=0)SAt)-S x (t=0)C x (t)} 
and find that when (38) holds, t x a is given by 

(45) t x a =T x - 1 ({x Cy(t=0)+y ^kjF b S x (t=0) }/{y Q JK/k b C x (t=0)+x S Y (t=0) }). 

We observe that by (26) the argument of the inverse function T x ~ l in (45) belongs to the range of 
! T x (see (44)) when (38) holds (see Theorem 4'). For / =0, (45) simplifies to 

- tx a =T x -\x /yoJkJk a ). 


Whether or not force annihilation occurs within a given finite time t g then depends on whether or 
not t x a <t„. Thus, (I) determination of Q*, and (II) tabulation of T x (t) and T Y (t) would allow one 
to determine (without explicitly computing the force-level trajectories) the time at which a side is 


Let us now apply the above general theory to (11) with the general power attrition-rate 
coefficients (12). We observe that in this case t = — C, where C>0. In order that Condition (A) 
holds we must have n, v^> — 1, and then Condition (B) is satisfied. 

As we have seen in Section 4, our theory of force-annihilation prediction depends on knowing 
Q*, the solution to the auxiliary parity-condition problem. For the power attrition-rate coefficients 
(12), it is more convenient, however, to determine Q* via the modified auxiliary parity-condition 
problem (30) (see also (34)). Hence, we apply the transformations (27) and (28) to (17). For the 
coefficients (12), equations (30) take the form 

\de x ~/ds=—e Y ~ with e x ~(s=0) = l, 


I de Y ~/ds=— se>(l+y/s a Ye x - with e Y ~(s=0)=Z, 

where the parameter K in (27) is given by K=(^k a k b l(fj. J \-l)) 2p ~ 1 ; and we have p=(/z+l)/(/u-f-i>-f-2), 

q=l-p, «=1/(m+1), /3=(y-ju)/G*+l), and 7=^(VW(m-H)) 2/( ' j+ " +2) - For M , v>-l, we have 0<p, 


After we have solved the above modified auxiliary parity-condition problem (i.e. determined 
Z=Z* for (46) such that (34) holds), we have all the information required to determine, without 
explicitly computing the entire force-level trajectories, whether or not force annihilation occurs in 
battles modelled with (12). We may apply Theorem 4 via (28) (possibly using (26)) to see who can 
be annihilated and use results such as (40) (or equivalent ones for hyperbolic-like power Lanchester 
functions (see Taylor and Brown [25])) to see if force annihilation occurs in a given finite time 
(for example, for battles modelled by (13)). When C=0 for the coefficients (12) (e.g. for the model 
(13), ~R =R a <Rp from (15)), we have by Corollary 4.1 

COROLLARY 4.2: For combat between two homogeneous forces modelled by (11) and (12) 
with C=0, the X force can be annihilated if and only if 

i /2/„<(1/Z*)(VW j /(m+1)) 1 - 2p V^A, 

where Z*=Z*(y, M> v) is such that (34) holds for (46). 

We will next give exact analytic results for cases of no offset (i.e. ^4 = 0=>7=0) and discuss the 
difficulties of determining Z* when there is offset (i.e. A, r>0). Moreover, results for the special 
case of no offset help provide a lower bound for Z* in the general case. 



When the offset parameter A=0, equations (46) become for n, i>> — 1 

\de x ~/ds= — e Y ~ vrithe x ~(s=0) = l, 


[de Y /ds=—s s e x with e Y (s=0)=Z. 

Solving (47) by successive approximations (see pp. 757-760 of Ta}dor [22]), we obtain 

(48) e x ~(s; Z) = S p 2k s k ^ +2) I \ U jij-p) ) -Z f} p*?v+*>+i I f n jij+p) 

which is not the most useful result for large s. Without explicitly computing e x ~(s; Z) it is impos- 
sible to determine from (48) howe.x~(s;Z) behaves for increasing s. However, let us write e x ~(s; Z) 

(49) e x -(s;Z)=T(q)p-<'{A 8 ( S )+[l-Zp^r(p)/T(q))ps l ' 2 I p (S)}, 

(observe that e x ~ satisfies the generalized Airy equation (see Swanson and Headley [19]) 

d 2 e x ~/ds 2 —sPe x ~ = 0, 
: which is well known to be reducible to Bessel's equation) for any arbitrary Z, where 
: Ap(s) denotes the generalized Airy function of the first kind of order /3 (see Swanson and Headley 

[19]), IpiS) denotes the modified Bessel function of the first kind of order p, and S=2ps^ +2)/2 . Also, 

(50) e Y -is; Z)=r( 2 )p-«{(2pM ismpTr)s^^ 2 K ll iS)-[l-Z I) ^Tip)/Tiq)]dF/dsis)}, 

where K g iS) denotes the modified Bessel function of the third kind (also called Macdonald's 
function) of order q and Fis)=ps 1/2 I P iS). 

The behavior of Apis) for s>0 is readily seen from (see Swanson and Headley [19]) 

Apis) = i2p/w) (sin pir)s 1/2 K P iS) 

It is readily seen (see pp. 119, and 1'23 of Lebedev [14] or pp. 250-251 of Olver [17]) that K P iS) is 
strictly decreasing and positive, and 

lim 8>K P (S)=0, 

where v is any real number. It is then clear (see also p. 1404 of [19] that 

Apis),I P iS)>0\/s>0, lim^(*)=0, and lim / p (5) = + oo. 

Consequently, the requirement that (34; holds (i.e. that e x ~ and e Y ~ behave like strictly decaying 
exponentials) means that the second term in the expressions (49) and (50) must vanish. Thus, for 
7=0 we have 

(51) Z*iy=0 ) n,v) = p J >-«Tiq)/rip). 


Hence, Theorem 4 becomes f: 

THEOREM 5: Consider combat between two homogeneous forces described by (11) with 
attrition-rate coefficients (12) with A=0. Assume that the model applies for all time and that Y 
"wins" when x(t f )=0 with ?/(£/)>0. Then Y will win if and only if 

x /y <^k a /h(^k a k b /( f i+l)y-' i 'e x -(s=So; Z*)le y -(s=s ; Z*), 

Z*=Z*(y=0,»,v)=p>-°r(q)/T(p), , 

e x -(s;Z*)=p->r(q)Ae(s), 


e Y - (s; Z*) = (2p/ir) (sin pir)p- s r(q)s^ +l)l2 K Q (S). 

For (7=0, we have s =0 so that Y will win if and only if 

Xo/y <^fW b (^/kJc b /^+ v +2)y-^T(p)/^(q). 

We observe that the infinite series form (48) is not of any value for determining asymptotic properties 
of the solution e x ~ to (47) (and consequently Z*), although it is useful for computational purposes. 


When the offset parameter A >0, explicit analytic results for Z* are apparently not possible. 
Before considering the general case of n, j»>— 1 and 7>0, we will find it instructive to consider 
the special case of offset linear attrition-rate coefficients (i.e. y>0 with n=v=l) studied by Bonder 
and Farrell [4]. This will show us why analytic results for Z* are elusive in cases of positive offset. 

For 7>0 with p= v =l } equations (30) and (31) become 

(52) d 2 x/ds 2 -(l+y/^s)x=0, 

t For the case of power attrition-rate coefficients with no offset (i.e. -1 = in (12)), the second annihilation condi- 
tion given in Theorem 5 (i.e. the one for C=0) and an equivalent form of the first (i.e. the one for C>0) may be 
developed by inspection when one expresses, for example, the time history of the X force level (which satisfies (18)) 
in terms of the so-called generalized Airy functions (see Swanson and Hcadley [19]). For example, for C=0 we have 

x(t) = (p*/(2p)){X+Y}A fi (T) + (p»l(2^)){X-Y}Be(T) > 

where A$ and B B denote the generalized Airy functions of the first and second kinds of order p, 

X=x T(g), Y=y^kJ b r(p)^kJ b /(»+v+2)y- 2 *, 

r=(VWGa+i)) 2 ^ +1 - 

The result given in Theorem 5 for C= follows from the properties of the generalized Airy functions 
(i.e. A,(£), £,(£)>0V£>0, lira A(£)=0, and lim _B v (£) = + oo) 

and the above representation for x(t). Unfortunately, this result does not generalize to other cases of interest, al- 
though it did motivate our general theory of foruc annihilation developed in this paper. The generalized Airy functions 
may be considered to be generalizations of the exponential function (see p. 446 of [1] or p. 393 of [17] for plots of 
the standard (i.e. 0= 1) Airy functions) and arise in the study of the asymptotic behavior of solutions to certain 
differential equations. 


with initial conditions x(s=0)=x and dx/ds(s=0) ——-\lk a /k b y , and 

Ide x ~/ds=—e Y ~ with e x ~(s=0) = l, 

de Y ~/ds= — (l+y/^s)e x ~ with e Y ~(s=0)=Z. 


Let us now consider solving the above modified auxiliary parity-condition problem (i.e. deter- 
mining Z=Z* for (53) such that (34) holds). Unfortunately, (52) does not correspond to any set 
of special functions we can find (see footnote J p. 350). However, solving (53) by successive approxi- 
mations, we may write 

(54) • e x ~ (s) =h(s) -Zw(s), and e Y ~ (s) =ZH(s) -W(s) , 


h(s)=h(s; y)=h(\=s, n=y/yfs), 

w(s), H(s), and W(s) denote the auxiliary offset linear Lan Chester functions introduced by Taylor 
and Brown [25]. Infinite series representations of these functions are given in [25]. Subsequent 
research has shown that these hyperbolic-like Lanchester functions possess the following properties: 

PROPERTY 1: dh/ds=W, dW/ds=(l+y/^s)h 

PROPERTY 2: dw/ds=H, dH/ds=(\+y/^s)w 

PROPERTY 3: h(s)H(s)-w(s)W(s) = l \/s 

PROPERTY 4: h(s=0)=H(s=0) = l 

PROPERTY 5: w(s=0) = W(s=0)=0 

PROPERTY 6: h(s; 7 =0)=#(s;- 7 =0)=cosh s 

PROPERTY 7: w(s; 7 =0)=sinh s=W(s; 7 =0). 

Unfortunately, information on the asymptotic behavior of the auxiliary offset linear Lanchester 
functions for large s>0, which is needed to solve the modified auxiliary parity-condition problem 
(53), is lacking and apparently not obtainable by the standard methods involving integral repre- 
sentation (see Ince [11] or Oliver [17]). Consequently, we have not been able to develop an explicit 
analytic expression for Z*( 7 >0, n=l, v=l), although we give upper and lower bounds for Z* in 
the next section for general m, »»> — 1. Additionally, we should point out that there are computa- 
tional difficulties: in searching for Z* via its definition (34) : (a) one doesn't know how large to take 
s for "satisfactoty" results, and (b) numerical difficulties in evaluating e x ~(s; Z) and e Y ~(s; Z) as 
given by (54) occur for large values of s (since we are taking the difference of two very large numbers 
and, at least on a digital computer, can retain only a limited number of significant digits in these 
numbers) . 

Equation (52) looks deceptively simple. Using variation of parameters, we may also express 
its solution as 

(55) x(s)=x cosh s—y ^k a /k b sinh s+y j sinh (s— a)x{<r)/ y/ada. 

Although (55) is a simple looking expression, this Volterra integral equation is, unfortunately, no 
easier to solve than (52) and leads to the same results as given by Taylor [22] and Taylor and 
Brown [25]. 



We will now develop upper and lower bounds for Z*(7>0, n, v). These bounds establish the 
existence of Z* (and consequently Q*) for general power attrition-rate coefficients (12) by the 
continuous dependence of solutions to (46) on the initial conditions. 

The following two lemmas will be used to obtain an upper bound for Z*(y, p, v) for p, v^>— 1. 

LEMMA 1: For5>l and x, y>0, 2 i - 1 {x i +y s }>(x+y) s . 

PROOF: For 5>1, f(x)=x s is a convex function. A well-known theorem for convex functions 

says that 


whence follows the lemma. Q.E.D. 

LEMMA 2: For 6<1 and x, y>0, x s +2/ 6 >(x+i/) s . 

PROOF: Dividing by {x+y) & , we need to show that [x/(x+y)Y+[y/(x+y)Y>l. If x and y>0, 
then x/(x-\-y) and y/(x-\-y) are <1. Hence, for any 5<1 we have [x/(x+y)Y>x/(x+y) so that 

[x/(x+y)Y+[y/(x+y)Y>[x/(x+y))+[y/(x+y)]=i. Q.E.D. 

Using the above lemmas, we now prove Theorem 6. The upper bound to be given in Theorem 
6 might be improved upon, although we feel that for computer determination of Z* by interval 
search it is not essential to have a better bound. 

THEOREM 6 : For M > - 1 and v > 1 , we have 

Z*( 7 ,m,^<1+2"- 1 (m+1) 2 /[(''+1)(m+ I '+2)]+7''2- 1 (m+1)7(m+2); 

while for ix^> — 1 and — ;1<V<1, we have 

Z*(7, fl, 0<l + (M+l) 2 /[^ + l)(M + f+2)]+r(M+l)7(M+2). 

PROOF: Recalling (34), we have from the first equation of (46) that e x ~(s; Z*)<1 for &->0; 
and from the second equation of (46) we then obtain for s>0 and v>l 

(56) de Y -/ds>-se(l+y/s a y> — 2"-V(l +7"/s a "), 

the latter inequality being a consequence of Lemma 1. From (30) we obtain 

(57) e r -(s)>Z*-2"- 1 s("+ 1) /^ +1 '.( M +l)/( I -M)-2^-V( M +l)s 1/( '' +1 ). 

Using (57) and considering the first equation of (46), we obtain e x ~(s; Z*)<CU{s; Z*), where 
^(s;Z*)=1+2^M(m+1)7(m+2)}{s^ +v+2)/ ^ +1, -(m+2)/[(^+1)(m+v+2)]+7^ ( ^ +2,/( '' +1) }-Z*s. 

Since we must have 0<Cx"(s; Z*)<f/(s; Z*) for all s>0, it follows that for s=l we must have 
U(s = l; Z*)>0, whence follows the theorem for v>l. Lemma 2 and similar arguments are used to 
prove the theorem for — 1<><1. Q.E.D. 

Let us now consider the development of a lower bound for Z*(7>0, n, v). Before proving the 
key lemma (Lemma 3) for the proof of Theorem 7, we discuss some preliminary considerations. 


As shown by Taylor and Parry [26], force annihilation for square-law attrition processes 
sometimes may be predicted by considering the force-ratio equation. For equations (46), the 
"force ratio" u=e x ~/e Y ~ satifies the Riccati equation 

(58) du/ds=si > (l+y/8 a yu 2 —l, 

with initial condition u(s = 0) = l/Z. We observe that u(s f )=0 if X is annihilated (i.e. e x ~(s f )—0 
but e r "(s / )>0, and u(s f ) = -\- =° if Y is annihilated. For Z=Z*=Z*(y, ju, v) such that (34) holds, 
we have a "draw," and u*(s)=u(s; Z=Z*)>0 and finite for all finite s>0. Since (58) has a singu- 
larity at s=0, we apply the transformation T=(s ar +7) M+1 to obtain for t>t — 7 M+1 

(59) duldT=T»u 2 —{l—ylT a y, 

with initial condition u(r=y> 1+l ) = \jZ. 

We now develop a lower bound for Z*(y^>0, n, v) by comparing results for7>0 with those for 
7=0. For 7=0, we denote u &s w and obtain for t>0 

(60) dw/dT=Tf>w 2 -l. 
Corresponding to Z=Z*=Z*(7=0, n, v), we have via (49) and (50) 

(6i) to»(T)=T-/'/«x p (D/x t (r)>o, 

where T=2pr^ +2)/2 . Since K„(x) is finite and >0 for all v, z>0 (see Lebedev [14], p. 136), we readily 
verify that w*(t) is finite for all r>0 and that w*(T=0)=p Q ~ p T(p)/T(q). Let us observe that 
w(ti)^>w*(ji)=w(t 1 ; Z*)— > we have w(r)>w*(r)y r>0 and w(r 2 ) = + °° for some finite t 2 >ti, 
since D=w—w* satisfies dD/dT = T (w-\-w*)D. Consequently, w(t; Z) corresponds to Z<^Z* and 
w(r; Z) becomes infinite at some finite time (see equations (49) and (50)). We now state and prove 
the key lemma for developing a lower bound for Z* 

LEMMA 3: Let w*(t) be given by (61) andlet u(t)=u(t;Z) satisfy (59) forT>T =7 M+1 . Then 
if u(ti) >w*(ti), it follows thatu(r)>0 \f t>ti andw(r 2 ) = -f °° for some finite t 2 >ti. Consequently, 

PROOF: Consider D=u—w. It satisfies for t>t via (59) and (60) the equation 

dD/dT = Te(u+w)D+{l-(l-yJT a y}. 

If D(ti)=u(t 1 )-w(ti)>0, then cZ5/f/r(Ti)>0 and Z5(r)>0 V r>r lf Thus, when u(n) >w*(t,), we 
can find w(t 2 ) for t 2 >ti such that «(r 2 )> w(r 2 )> w* (t 2 ) , whence follows the lemma from w(t)>w 

A A A 

(t; Z) V t>t 2 with Z<Z*(7=0, m, v) and the above observation that w{t; Z) becomes infinite. 


Letting tj=t , we obtain 

THEOREM 7: Z*(7>0, n, v)>\lw*{y^ l ) = 4 l2 K q (T,)IK p {T,), where r =7" +1 and T =2prf +2)/2 . 
Since w*(r=0) >w*(t) for /3>0, we have as an immediate corollary 

COROLLARY 7.1: For v>n, Z*(7>0, n, v)>p^ Q T(q)/T(p). Let us observe that for /8>0, 
the lower bound given in Corollary 7.1 is weaker than that in Theorem 7. 



As we have seen above in Section 4, force-annihilation prediction depends on knowing the 
parity-condition parameter Q*, which may be called "the Y equivalent of an X force of unit 
strength." We have explicitly determined Q* (via determining Z* for the modified auxiliary parity- 
condition problem (46)) for the power attrition-rate coefficients (12) in the case of no offset, i.e. 
A—Q. Tabulations, for example, of the new modified exponential-like general Lanchester functions 
e x ~(s; Z*) and e Y ~(s; Z*) would facilitate force-annihilation prediction (see Theorem 5). It remains 
to determine Q* for cases of positive offset, i.e. A^>0. As discussed in Section 7, analytic results 
for Z* in the modified auxiliary parity-condition problem (46) with y=A- (VMV(m+ l)) 2/( " +,,+2) >0 
are apparently not possible by the usual analytical methods, so we must turn to numerical methods. 
It appears that a large number of cases of tactical interest (see Taylor and Brown [25]) would be 
covered by determining Z* for /x, v—0, 1, 2, 3 and for a range of values of 7>0. One would be 
interested in, for example, plotting Z* versus y for fixed values of /x and v. 

Since we have developed upper and lower bounds for Z* when 7>0 (see Theorems 6 and 7), 
we can use standard one-dimensional search techniques (see, for example, Wilde [28]) to calculate 
an approximate value of Z* with any predetermined degree of accuracy, depending of course, on 
how much computation we wish to do. Since (34) must hold for all s>0, we must determine how 

A A 

long (i.e. for how large a value of s) to carry out computations of e x ~(s; Z) and e Y ~(s; Z) in the 


modified auxiliary parity-condition problem (46) for a given trial value of Z* (denoted as Z) to 

see whether it is too large or too small. In the future we will show that by considering the Ricatti 

equation (59) one can "cut off" computations for a given value of Z well before either of two 

annihilation conditions (i.e. e x ~<C0 or e Y ~<^0) is actually reached. 

As discussed at the end of Section 4, prediction of force annihilation within a given finite 
time involves the use of tabulations of the quotient of two linearly-independent general Lanchester 
functions. We have indicated in Section 4 that the hyperbolic-tangent-like Lanchester functions 
(e.g. T x (t) as defined by (42)) are to be preferred because of the accuracy of their numerical 
computation. Thus, there is a need for tabulations of T x (t), whose range is [0, \/Q*) for U [0, + °°) • 
For the power attrition-rate coefficients (12) with no offset, the power Lanchester functions (also 
called Lanchester-Clifford-Schlafli (or LCS) functions), however, were inappropriately defined in 
Taylor and Brown [25] to yield such tabulations. Thus, our newer theory of force-annihilation 
prediction, which also involves tabulations of canonical solutions (i.e. canonical Lanchester func- 
tions) to variable-coefficient Lanchester-type equations of modern warfare, has suggested some 
refinements in the definition of Taylor and Brown's [25] auxiliary power Lanchester functions.! 
It would be desirable then to redefine the LCS functions to fit within the framework of Section 4 
and to develop tabulations of their quotient. If this were to be done, linear combat models with 
power attrition-rate coefficients (no offset) could be analyzed with somewhat the same ease as 
constant-coefficient linear models (i.e. (1)). 

t Although theoretically results for power attrition-rate coefficients with no offset are expressible in terms of 
"known" transcendental functions, new Lanchester functions were introduced by Taylor and Brown [25] because 
of lack of tabulations of these in many cases of interest. 



In this section we show that all our above force-annihilation results (except those for force- 
annihilation within a fixed finite time) also apply to a special case of a more general model. More- 
over, comparison techniques may be used to extend these results in weakened form to the general 
case of this more comprehensive model. 

Let us consider the following Lanchester-type equations 

\dx/dt=—a(t)y—p(t)x "with x(t=0)=x , 

[dy/dt——b(t)x—a(t)y with y(t=0)=y , 

where a(t), b(t), a(t), and /3(£)>0. We may think of these equations as modeling, for example, 
aimed-fire combat between two homogeneous infantry forces with superimposed effects of support- 
ing weapons (which are not subject to attrition and deliver area fire against the enemy infantry). 
(See Taylor and Parry [26] for a further discussion of this model (62).) In this case, a(t) and /3(0 
are attrition-rate coefficients which reflect the fire effectiveness of the supporting weapons [26]. 
Then, the force ratio u=x/y satisfies the generalized Riccati equation 

(63) du/dt=b(t)u 2 +{a(t)-p(t)}u—a(t) with u(t=0)=x Q /y . 

For equal effectiveness of the supporting fires [i.e. a(t)=0(t)], equation (63) simplifies to 

(64) du/dt=b(t)u 2 -a(t), 

which is the same Riccati equation satisfied by the force ratio for the model (11). Hence, when 
a(t)=p(t)yt>0, a battle's outcome (in terms of the force ratio) is the same for the two models 
(11) and (62), although the battle ends more quickly for (62). Thus, in this special case all our 
above results on force annihilation without time limitation (e.g. Theorems 3 through 5 and their 
corollaries) developed for (11) (in general or with the coefficients (12)) also apply to the more general 
model (62). Furthermore, comparison techniques (see, for example, Hille [10]) may be used to 
extend these results in weakened form to (62). Consequently, we see that the force-annihilation 
results developed in this paper are indeed of a fundamental nature. 


We have presented a mathematical theory for predicting force annihilation for variable- 
coefficient Lanchester-type equations of "modern warfare" for combat between two homogeneous 
forces without explicitly computing force-level trajectories, f Our force-annihilation theory pro- 
vides guidance for certain parameter determinations and development of tabulations of Lanchester 
functions (beyond those suggested in [25]) that would allow one to parametrically analyze variable- 
coefficient models with somewhat the same facility as constant-coefficient ones. We have shown that 
force annihilation can be predicted from initial conditions, without explicitly computing force-level 

fin his well-known survey paper on the Lanchester theory of combat, Dolansky [9] suggested, as one of 
several problems for future research, developing outcome-predicting relations without solving in detail. 


trajectories, by knowing a parity-condition parameter Q*, which is the solution to a canonical aux- 
iliary parity-condition problem. In general Mas prediction would be facilitated by having tabula- 
tions of certain Lanchester functions available. The parity-condition parameter Q* was shown to be 
related to the range of the quotient of twc ,.erbolic-like general Lanchester functions introduced 
by Taylor and Brown [25]. Consequent) 1 our force-annihilation theory not only provides new 
information about the mathematical pr 3s of hyperbolic-like Lanchester functions but also 

provides guidance for selecting canonical I ichester functions. 

We applied our general theory to the specific case of general power attrition-rate coefficients. 
Considering a modified auxiliary parity-condition problem, we explicitly determined Q* (via Z* 
of the modified problem) for power attrition-rate coefficients with no offset a id gave upper and lower 
bounds for Z* for cases of positive offset. We finally showed that certain of our force-annihilation 
results also applied to a more general linear differential equation combat model. 

These results may be used in the analysis of the dynamic combat interactions between two homo- 
geneous forces with time- (or range-) dependent weapon system capabilities. There is interest today 
hi such analytic models because of improved opera lions research techniques for predicting Lan- 
chester attrition-rate coefficients, in particular their temporal variations (see [2] through [5] and 
[25]). Further discussion of such applications may be found in Bonder and Farrell [4], Bonder and 
Honig [5], Taylor [22], and Taylor and Brown [25]. 


[1] Abramowitz, M., and I. Stegun (Editors), Handbook of Mathematical Functions, National 

Bureau of Stand Applied Mathematics Series, No. 55, Washington, D.C. (1964). 

[2] Bonder, S., "The Lanchester Attrition-Rate Coefficient," Operations Research, 15, 221-232 

[3] Bonder, S., "The Mean Lanchester Attrition Rate," Operations Research, 18, 179-181 (1970). 
[4] Bonder, S., and R. Farrell (Editors), "Development of Models for Defense Systems Planning," 

Report No. SRL ;^147 TR 70-2 (U), Systems Research Laboratory, The University of 

Michigan, Ann Arbor, Michigan (Sept. 1970). 
^5] Bonder, S., and J. Honig, "An Analytic Model of Ground Combat: Design and Application," 

Proceedings U.S. Army Operations Research Symposium 10, 319-394 (1971). 
[6] Brackney, H., "The Dynamics of Military Combat," Operations Research 7, 30-44 (1959). 
[7] von Clausewitz, C, On War, edited with an introduction by A. Rapoport (Penguin Books, 

Ltd., Harmondsworth, Middlesex, England, 1968). 
[8] Coddington, E., and N. Levinson, Theory oj Ordinary Differential Equations (McGraw-Hill, 

New York, 1955). 
[9] Dolansky, L., "Preser : State of the. Lanchester Theory of Combat," Operations Research 

12, 344-358 (1964). 
[10] Hille, E., Lectures on Ordinary Differential Equations (Addison-Wesley, Reading, Massa- 
chusetts, 1969). 
[11] Inee, E., Ordinary Differential Equations (Longmans, Green and Co., London, 1927) (reprinted 

Sy Dover Publications, Inc., New York, 1956). 


[12] Kamke, E., Dvfferentialgleichungen, Losungsmethoden und Losungen, Band 1, Gewohnliche 
Dvfferentialgleichungen, 8. Auflage (Akademische Verlagsgesellschaft, Leipsig, 1944) (re- 
printed by Chelsea Publishing Co., New York, 1971). 

[13] Lanchester, F. W., "Aircraft in Warfare: The Dawn of the Fourth Arm — No. V., The Principle 
of Concentration," Engineering, 98, 422-423 (1914) (reprinted on pp. 2138-2148 of The 
World of Mathematics, TV, J. Newman (Editor) (Simon and Schuster, New York, 1956). 

[14] Lebedev, N. N., Special Functions and Their Applications (Prentice-Hall, Englewobd Cliffs, 
New Jersey, 1965) (reprinted by Dover Publications, Inc., New York, 1972). 

[15] Lee, E., and L. Markus, Foundations of Optimal Control Theory (John Wiley, New York, 1967). 

[16] McCloskey, J., "Of Horseless Carriages, Flying Machines, and Operations Kesearch: A Tribute 
to Frederick William Lanchester (1868-1946)," Operations Kesearch 4, 141-147 (1956). 

[17] Olver, F. W. J., Asymptotits and Special Functions (Academic Press, New York, 1974). 

[18] Schwarz, H. A., "tlber diejenigen Falle, in welchen die Gaussische hypergeometrische Keihe 
eine algebraische Function ihres vierten Elementes darstellt," Journal fur die Keine und 
Angewandte Mathematik (Berlin) 75, 292-335 (1872) (also pp. 211-259 in Gesammelte 
Mathematische Abhandlungen, Zweiter Band, J. Springer, Berlin, 1890 (reprinted by Chelsea 
Publishing Co., New York, 1972)). 

[19] Swanson, C. and V. Headley, "An Extension of Airy's Equation," SIAM Journal of Applied 
Mathematics, 15, 1400-1412 (1967). 

[20] Taylor, J., "A Note on the Solution to Lanchester-Type Equations with Variable Coefficients," 
Operations Research, 19, 709-712 (1971). 

[21] Taylor, J., "Lanchester-Type Models of Warfare and Optimal Control," Naval Research 
Logistics Quarterly 21, 79-106 (1974). 

[22] Taylor, J., "Solving Lanchester-Type Equations for 'Modern Warfare' with Variable Coeffi- 
cients," Operations Research, 22, 756-770 (1974). 

[23] Taylor, J., "Survey on the Optimal Control of Lanchester-Type Attrition Processes," pre- 
sented at the Symposium on the State-of-the-Art of Mathematics in Combat Models, 
June 1973 (also Tech. Report NPS55Tw74031, Naval Postgraduate School, Monterey, 
Calif., March 1974). 

[24] Taylor, J., "Target Selection in Lanchester Combat: Heterogeneous Forces and Time- 
Dependent Attrition-Rate Coefficients," Naval Research Logistics Quarterly, 21, 683-704 

[25] Taylor, J., and G. Brown, "Canonical Methods in the Solution of Variable-Coefficient 
Lanchester-Type Equations of Modern Warfare," Operations Research, 24, 44-69 (1976). 

[26] Taylor, J., and S. Parry, "Force-Ratio Considerations for Some Lanchester-Type Models 
of Warfare," Operations Research, 23, 522-533 (1975). 

[27] Weiss, H., "Lanchester-Type Models of Warfare," pp. 82-98 in Proc. First International 
Conference on Operational Research (John Wiley, New York, 1957). 

[28] Wilde, D., Optimum Seeking Methods (Prentice-Hall, Englewood Cliffs, New Jersey, 1964). 




L. Peter Jennergren University 
Odense, Denmark 


Three different solutions to a very simple transfer pricing problem are outlined 
and contrasted. These are labeled by their authors: Hirshleifer, Enzer, and Ronen 
and McKinney. Weaknesses associated with each solution are pointed out. 


The topic of transfer pricing in a divisionalized firm has been discussed in a large number of 
journal articles and other publications over the last 20 years. One of the fundamental contributions 
to the transfer pricing literature is a paper by J. Hirshleifer [2], which has become the starting 
point for many later investigations. Hirshleifer advocated marginal cost pricing. His proposal has 
)een largely acknowledged as a theoretically correct solution to the transfer price problem. Yet, in 
a recent issue of this journal, H. Enzer [1] argued emphatically that Hirshleifer's solution is incorrect. 
Instead, he proposed average cost pricing. This note will demonstrate that the solution proposed by 
Enzer is also subject to criticism. In fact, there are three different solutions, each with its own 
merits and faults. 

The simplest possible transfer pricing situation will be considered. A company consists of two 
divisions, the manufacturing and the distribution division. These will be labeled Division 1 and 
Division 2 throughout. The company makes one product. It is manufactured in Division 1 and then 
transferred to Division 2 for further refinement and marketing there. There is no intermediate 
market. The market for the finished product is perfectly competitive, implying a fixed price level. 
The company wants to find a production program maximizing profit. In what follows, fixed costs 
will be assumed away, since they are immaterial to the analysis. 

A minimum of notation must now be introduced. Let 

x: amount produced in Division 1 and transferred to Division 2; 

y: amount refined in Division 2 and sold to outsiders; 

C\(x) : total variable cost incurred in Division 1 as a function of quantity produced ; 

C 2 (y) : total variable cost incurred in Division 2 as a function of quantity refined and sold to 
outsiders ; 

p : price of finished product. 



The production-planning problem facing the company is then : 

[Maximize with respect to x and y: py—Ci(x) — C 2 (y) 
[subject to: x—y. 

Assume for simplicity that Ci(x) and C 2 (y) are differentiate, strictly convex, and increasing 
functions. To rule out pathological cases, it will also be assumed that: 

(1) has a unique optimal solution (x, 2/)>0. 

Suppose it is desired to solve (1) in a decentralized fashion, meaning here that Division 1 is 
allowed to decide on x and Division 2 on y. But such choices cannot be made independently; and 
transfer pricing is one means of taking into account the interdependence of the two divisions. The 
relevant transfer pricing problem then becomes : How should one fix a transfer price so that it will 
induce Division 1 to pick x=x and Division 2 to pick y=y? 


Three different solutions have been proposed to the transfer pricing problem : in addition to the 
Hirshleifer and Enzer solutions, there is also the one by J. Ronen and G. McKinney [4]. 

Consider a decomposition of the over- all problem (1) into divisional subproblems as follows: 

(2) Division l's problem: Maximize with respect to x: rx—C x {x) ; 

(3) Division 2's problem: Maximize with respect to y: py—C 2 {y)—ry. 

Here, r denotes the transfer price. One can show that under the assumptions made earlier, there 
exists a transfer price r with the property that, for r=r, x solves (2) and y solves (3) (see, for in- 
stance, Mangasarian [3], pp. 80-82); this is the basis for the Hirshleifer solution. From (2), it 
follows that r must be equal to C\(x). That is, the correct transfer price is equal to marginal cost of 
Division 1 at the optimal production level, and that is precisely what Hirshleifer argues. To find r 
Hirshleifer suggests that Division 1 should inform Division 2 about its marginal cost function 
C'i(x). Division 2 could then determine x and y from the equation system 

\C\{x) + C' 2 {y)= V) 


being necessary and sufficient conditions for the over-all optimal solution (x, y) to (1). r would then 
be determined by Division 2 as r = C\(x) and sent back to Division 1. With that information, Divi- 
sion 1 could determine its optimal production level x by solving (2) for given r=r — C\(x). 

Against this solution, Enzer argues that Division 2 would have an opportunity to exploit 
Division 1. That is, the transfer price is not independent of amount acquired, x, as in problem 
formulation (3). Rather, the transfer price depends on x, and the relevant problem, from Division 
2's point of view, is in effect the following: 


I Maximize with respect to x and y: py—C 2 {y) — C\{x)x 
subject to: y=x. 

Let the optimal solution to (5) be denoted (x, y), which is different from (x, y). Actually, (x, y)<C 
(x, y), which means that Division 2 is acting as a monopsonistic buyer to exploit Division 1. Also, 
the transfer price becomes r = C\(x)<Cr. Division 2 would send this transfer price r , rather than r, 
to Division 1. Altogether, this results in a higher divisional profit for Division 2 but a lower divisional 

COMMENT • 375 

profit for Division 1 and a lower total profit for the company as a whole. To avoid this situation, 
Division 2 may have to be outright instructed to pick y=y. However, that would defeat the purpose 
of decentralization, according to Enzer ([1], p. 378). 

To eliminate the possibility of Division 2 exploiting Division 1 while still permitting Division 2 
to decide in a decentralized fashion, Enzer proposes that Division 2 should be given a price-quantity 
relationship other than 0\(x). Rather, Division 1 should submit to Division 2 (Ci(x)/x), which is 
Division l's average cost function. Taking this function as the price schedule, Division 2 constructs 
the divisional subproblem 


(Maximize with respect to x and y: py—C 2 (y) -(d(x)/x)x 
[subject to: y=x. 

But this is obviously the same problem formulation as the over-all problem (1), meaning that 

decentralized decision making by Division 2 is over-all optimal. The transfer price is now r=«7, 

x)/x). This leads Enzer to conclude that average cost of the manufacturing Division 1 is the correct 
transfer price. 

However, Enzer's solution has a drawback, too. Suppose that the transfer price r=(Q(z)/x) is 
»nt back to Division 1. Division 1 would then solve its problem (2) for r=r= (Ci(z)/5). The optimal 
iolution would in general be different from x. That is, if £is to be used as the one and single transfer 
brice, then Division 1 must be instructed to pick x=x. But that would also defeat the purpose of 

This suggests that two different transfer prices may be called for, one for each division. That is 
recisely what Ronen and McKinney advocate. The discussion up to now has been somewhat 
symmetric in that Division 2 has been described as the exploiting division. However, Division 

could equally well be thought of as exploiting Division 2. Namely, Division 1 realizes, too, that 
ae transfer price actually paid is not independent of amount supplied. Suppose now that Division 1 

provided with the price-quantity relationship [(py-C 2 (y))/y], representing Division 2's demand 
arve for the intermediate product. Division 1 can then construct the following divisional 

(Maximize with respect to x and y: [(py-C^y^/yjy-C^x) 

[subject to: x—y. 

ut this is obviously also the same problem as the over-all problem (1), and hence decentralized 
scision making by Division 1 is over-all optimal, too. The transfer price credited to Division 1 

)mes r=[(py-C 2 (y))/y]^r. This transfer price is referred to as average revenue by Ronen and 


Three different solutions-due to Hirshleifer, Enzer, and Ronen and McKinney-to a very 
nple transfer pricing problem have been outlined. Each one has drawbacks: 

1. Hirshleifer's solution, based on marginal cost, allows Division 2 to exploit Division 1. 

2. Enzer's solution, based on average cost, eliminates this exploitation possibilitv, but it 
plies that Division 1 must be outright instructed which production level to pkk 


3. The solution by Ronen and McKinney permits decentralized decision making by both 
divisions and eliminates exploitation possibilities. However, it is a more complex solution, since it 
involves two different transfer prices, based on average cost and average revenue. 

One would hence have to agree with Enzer's own statement that no transfer price exists which 
cannot be faulted in some way ([1], p. 378). This statement applies to Enzer's own solution as well, 
and this author would hesitate to label the Hirshleifer solution as less correct than the other two. 


[1] Enzer, H., "The Static Theory of Transfer Pricing," Naval Research Logistics Quarterly, 22, 
375-389 (1975). 

[2] Hirshleifer, J., "On the Economics of Transfer Pricing," Journal of Business, 29, 172-184 

[3] Mangasarian, O., Nonlinear Programming (McGraw-Hill, New York, 1969). 

[4] Ronen, J., and G. McKinney, "Transfer Pricing for Divisional Autonomy," Journal of Ac- 
counting Research, 8, 99-112 (1970). 


In a longer unpublished paper, from which the article published in the Naval Research Logis- 
tics Quarterly was extracted, it is suggested that the central office establish (Ci(x)/x) rather than 
(Ci(z)/x) as the transfer price. To avoid the situation Jennergren points out the recommended 
solution in the longer article is to treat the manufacturing division as a cost center with the objec- 
tive of cost minimization. It would seem that the appropriate objective for a manufacturing divi- 
sion which does not sell its output in the market should be cost minimizatiort rather than "profit" 
maximization. Of course, a cost center creates its own problems, such as the design of suitable 

Hermann Enzer. 


Charles A. Holloway 

Stanford University 
Palo Alto, California 


Using the general computational strategy of restriction, necessary conditions 
for optimality provide an alternative criterion for entering variables when de- 
generacies arises in linear programming problems. Although cycling may still 
occur, it is shown that if it is possible to make progress at the next iteration, the 
criterion is guaranteed to identify a non-basic variable which increases the value 
of the basic solution, thereby reducing stalling. An alternative method for deter- 
mining variables to exit the basis when degeneracies occur is also suggested. 

Consider the following linear programming problem : 

Maximize Cx; subject to Ax=b 

(1) * 


where A is an mxn matrix of rank m. If we denote a set of basic variables by x B , non-basic variables 
by x NB , then the reduced costs used in the simplex criterion are given by C=C B A B ~ l A NB — C NB . 

Using the general computational strategy of restriction (see [3]) in which the restricted vari- 
ables correspond to the non-basic variables, (1) can be written: 

Maximize Cx; subject to Ax=b 


(2) Xj >0 


The computational strategy of restriction involves solving a sequence of problems in which some 
of the variables are set equal to zero. Each solution is tested for optimality in the original problem. 
If it is not optimal, then one or more restricted variables associated with negative multipliers 
(\,<0), are released (added to x u jeF). If the solution to the last restricted problem resulted in an 
increased optimal value, any nonrestricted variable whose optimal value is zero can be added to 
the restricted set (see [5] for a detailed discussion of the use of restriction in concave programming) . 

If x is an optimal solution to (2) and m is an optimal multiplier vector for (2), then it follows 
directly from the Kuhn-Tucker conditions that x is optimal in (1) if X= Minimum (n T Ai—C,) >0 
where A } is the j th column of A. 



If we require that x h jeF in (2) correspond to basic variables in (1), then these conditions are 
seen to encompass the ordinary simplex optimality conditions and n=C B A B ~ 1 is an optimal multi 
plier vector. If a solution to (2) is non-degenerate, the well known result that the simplex multipliers 
are unique is also easily recovered. (It can be readily verified that if non-degenerate solutions exisl 
at each iteration, the procedure based on restriction is identical to the simplex procedure.) 

When degeneracies exist, optimal multipliers are no longer unique. Under these conditions 
Proposition 1 yields a pricing problem which gives the directional derivative of the optimal valu< 
of (2) with respect to an increase in a restricted variable. 

PROPOSITION 1 : If x is an optimal solution to (2), the directional derivative of the optimal 
value of (2) with respect to an increase in a restricted variable is given by : 

Q, F {xj) = Minimum (C y — n T Aj) 

(3) subject to n T A } -C f >0 jeF 

( M r 4,-C,)z,=0 jeF 

PROOF: Since the optimal value of (2) is a concave function of the right-hand-side of th< 
constraints (see, e.g., [3]), the minimum subderivative of the optimal value of (2) with respect t<j 
the right-hand-side of an equality constraint for a restricted variable is the directional derivative * 
associated with increasing the restricted variable. Subderivatives for restricted variables are givei 
by the corresponding components of the negative of optimal multiplier vectors, \=n T A — C (sec 
e.g., [4]). Therefore we have: 



tt F (x ] )= Minimum (Cj— n T Aj) 

s.t. \,>0 jeF 


X^Aj-Cj vi 

X r x=0 

Expression (3) follows directly by recognizing that x } =0 jfF.\\ 

If we define P=[xj\U F (x } )^>0], then under degenerate conditions we could choose enterin 
variables from the set x,eP whenever P^0. As shown in Proposition 2, P?^0 is a necessary an 
sufficient condition for an increase in the optimal value of (2) at the next iteration. 

PROPOSITION 2: If the optimal solution to (2) ( (superscripts denote iterations), x', is nc 
optimal in (1), then the solution to (2)' +1 is such that Cx t+1 ^>Cx l if and only if x k eP'^0 is release 
from the restricted set. 

PROOF: Let x k eP l 9^9> be released in (2)' forming the new restricted problem denoted (R) 
in which Xj>0, jeF'Uh and z,=0, jfF'Dk. If x* is an optimal solution to (R)', then Cx*>Cx' sin 
x l is feasible in R l . 

Assume Cx*=Cx', making x l an optimal solution to (R)'. Then, from the Kuhn-Tucker coj 
ditions there must exist a m such that ~ii T Aj— Cj>0, Qi T Aj—C j )x j t =0 for jtF'Uk- Therefore, JI 
feasible in (5) and consequently il F t(x k )<0. 




fat (x k ) = Minimum (C k — n T A k ) 

( fi ) s.t. n T A,-Cj>0 jeF' 

{fAt-Ox^ faF* 

But Q^(xjt)>0 by hypothesis and therefore Cx*>CV. Since (2)' +1 is either identical to (R)' or 
formed by restricting variables which are at the zero level in x*, x* is an optimal solution to (2)' +1 

and Cx ,+1 >Cx\ 

If Cx <+1 >Cx', then Cx*>Cx ( and by definition the variable released, x k , must satisfy JV(x*)>0 
and hence a; t «PV0.|| 

It follows that P=0 is a necessary condition for optimality (but not sufficient since the vector 
of directional derivatives is not in general a subgradient). 

The strategy of restriction also provides a mechanism for determining which currently un- 
restricted variables should be included in the restricted set when degeneracies exist (i.e., deciding 
which variable should leave the basis). If at iteration t, P'=0, Proposition 2 guarantees that the 
optimal value of the restricted problem remains the same after a restricted variable is released and 
therefore no variable is transferred into the restricted set. Since x' is feasible in (2) ,+1 and Cx t+1 = Cx' t 
x l is an optimal solution to (2) (+1 and consequently a solution to (2)' +1 is already at hand when 
P'=0. If x l is not optimal in (1), then, by sequentially releasing restricted variables to x„jtF 
without transferring any variables to the restricted set, we will identify an x k such that fi F (a; ft )]>0. 
When such an x k is found, the value of the objective function will increase and all variables which 
are in the optimal solution at the zero level can be restricted. 

The purpose of this note is to illustrate the application of restriction under conditions of 
degeneracy. No computational advantage (in production linear programming codes) is claimed. 
The following example demonstrates how stalling and cycling are prevented in a special case. 

An Example 

This example was constructed by Beale [1] and demonstrated both cycling and stalling when 
the ordinary simplex procedure is used without anti-cycling devices. 

(6) Maximize .75xi— 20x 2 +.5x 3 — 6x 4 


subject to x s +.25zi— 8x 2 — x 3 +9x 4 =0 

x» +.50x1—12x2— .5x 3 +3x4=0 

x 7 +x 3 =1 

If the ordinary simplex procedure is used, Xi is chosen for insertion into the degenerate basis 
(xj, x fl , x 7 ) and a tie between x 6 and x 8 occurs for the variable to be dropped. If an arbitrary rule of 
dropping the variable with the lowest subscript is used, the seventh solution will be identical with the 
initial solution displayed in (6). If Charnes' perturbation scheme [2] is used to resolve ties, x 8 is 


chosen and, although no progress is made this iteration, at the next iteration there is no tie for a 
vector to be removed, and the optimal solution is obtained in one additional iteration. 

If directional derivatives are used, we solve (3) by checking a }j and a 2j for j=\ and 3 (Ci, C 3 <0) 
Since a u , a 2 C>0, Q B (zi)=— °° ; however, a )3 , a 23 <0 and Q B (x 3 ) = .5. Therefore, we choose x 3 to enter 
the basis and x 7 leaves. The revised problem becomes: 

(7) Maximize .5 — .5x 7 +.75xi— 20x 2 — 6x 4 


subject to x 5 + x 7 +.25xi— 8x 2 +9x 4 = 1 
x„ -f-.5x 7 +.50xi— 12x 2 +3x 4 =.5 

Xj ~T%3 = 1 

From (7) we see that all degeneracies have been resolved. We bring in x x and x 8 exits resulting in: 

(8) Maximize 1.25— 1.5x 6 — 1.25x 7 — 2x 2 — 10.5x4 


subject to x 5 — .5x a + -75x 7 — 2x 2 + 7.5x 4 =.75 

.2x 6 + x 7 +2i— 24x 2 + 6x 4 = 1 

x 7 +2:3 = 1 

which is optimal. 


[1] Beale, E. M. L., "Cycling in the Dual Simplex Algorithm," Naval Research Logistics Quarterly 

2 (4) (1955). ? 

[2] Charnes, A., "Optimality and Degeneracy in Linear Programming," Econometrica, 20 (2 

[3] Geoffrion, A. M., "Elements of Large-Scale Mathematical Programming," I & II, Management 

Science, 16 (11) (1970). 
[4] Geoffrion, A. M., "Duality in Nonlinear Programming: A Simplified Applications-Orientec 

Development," SIAM Review, 18 (1) (1971). 
[5] Holloway, C. A., "A Generalized Approach to Dantzig-Wolfe Decomposition for Concave 

Programs," Operations Research, 21 (1) (January-February, 1973). 



Z. Govindarajulu 

University of Kentucky 
Lexington, Kentucky 


This note presents methods for solving for any one of the four parameters (such 
as sample size) involved in the construction of distribution-free tolerance limits in 
terms of the other three. These solutions are based on a normal approximation to 
the incomplete beta function. Numerical examples indicate that the approximations 
are very reasonable. Also considered are tolerance limits with a specified precision. 


Let Xi N < . . . <X NN denote the order statistics in a random sample of size N drawn a 
continuous population having an unknown distribution function (d.f.) F(x). Then it is well-known 
that the proportion of the population covered by the tolerance interval (X T-N , X N - S+ \, N ) exceeds 
/3(0</3<l) with confidence 7 provided, 1— Ip(N— fc+1, k) >y where k=r-\-s<CN-\-l, and' Ip de- 
notes Karl Pearson's incomplete beta function (See Wilks [3], p. 334). Sommerville [2] has tabu- 
lated the values of k for given TV, /3, and y, and the values of y for specified N, k and /3. It is of 
much interest to solve for any one of N, k, and y when the others are specified. However, 
it is inconvenient to work with the incomplete beta function. In the following, we shall give a 
simple approximation to the incomplete beta function, which is based on the normal distribution. 

PROPOSITION 1 : For TV sufficiently large, we have 

(1) I,(N-k+l,k)=*l k -V 2 - m -V 

{ma-®} 1 ' 2 1 

where * denotes the standard normal d.f. and I x (a, b) denotes the incomplete beta function having 
parameters a and b. 

PROOF: Let W(N, 1— /3) be a binomial random variable having parameters N and 1— /3. 
The proposition follows from the well known relationship P(W(N, l—p)<k—l)=If){N—k+l, k) 
and the use of the normal approximation to the binomial with the correction for continuity. 

SOLUTION FOR 7: Equation (1) is already in a form to obtain 7 in terms of N, k and /3. 

SOLUTION FOR /3 WHEN N, k AND 7 ARE SPECIFIED: Let z=*~ l (1-7), which 
will be negative when 7>l/2. Squaring the inequality 

k- 1/2- N(l -p)< zVAWl-iS) 

♦Research supported in part by the Navy under Office of Naval Research Contract No. N00014-75-C-1003 
task order NR042-295. Reproduction in whole or part is permitted for any purpose of the United States Government 


and solving the resultant quadratic equation leads to the solution 


(2) (1 -'>+ay ± ^la9+, (1 - 


where a=(k—l/2)/N. 

It should be noted that we should take the smaller of the two roots for our purpose since the 
larger root may exceed unity and we want the coefficient to be at least y. Also when N is large, 

(3) . '• ^{\-a)+^={a{\-a)Y». 

EXAMPLE 1: Let k=5, iV=100 and 7=.95. Then 2=-1.645 and equation (2) yields /3=.907 
and equation (3) gives /3=.921, whereas the true value of /3 is .90 (See Sommerville [2]). 

SOLUTION FOR N: Let N(l-p)=M and let z=Sr l (l—y), and solving for JM from the 

we obtain 

VM=[-2Vi3±{|32 2 +4(A:-l/2)} 1/2 ]/2. 


(4) N=[-2V)3+{/3 2 2 +4(A:-l/2)} 1 / 2 ] 2 /4(l-^). 

EXAMPLE 2: Set y= .95, j8=.9, k=9, 2= -1.645. 

JM= (1.5604 ±6.036)/2. 

Here we take the larger of the two roots (so as to be conservative) and get 

VM=3.798, yielding iV= 144.25 

whereas the true value from Sommerville [2] is 144. 

SOLUTION FOR k: Let z=$-*(l-y). 

(5) k<z{N0O—0)} l/a +Nil-p) + l/2. 

In the following we tabulate the value of k for specified values of N, /3, and y and compare then 
with the true values obtained by Sommerville [2]. The numbers in parentheses represent the tru 
values given by Sommerville [2]. 




A T 

7 =.99, j8=.95, k<.5+(.05)N-(.51)N i/2 

7 = .95, = .9O, k<.5+(.\)N-(A9)N 1/2 


•42 (1) 

5. 57 (5) 


.96 (1) 

7. 17 (7) 


1. 61 (2) 

8.98 (9) 


2.35 (3) 

10. 98 (11) 


10.35 (11) 

30. 63 (30) 


19.06 (19) 

50.65 (50.5) 


One can relate the tolerance and confidence limits by considering the generalized tolerance 
limits proposed in the following Let U TiSiN =F(X N -^ s+ i iN ) — F(X fiN ). Also\eta—EU r , s ,N=l--k/(N-\-l) 
where k = r-{-s. Consider the following relation: 

(i) P(\U r , s , N -a\<A)>v, 

(ii) P(U T . s . N -a>-A)>y, 



P(U T .,. N -a<A)>y. 

Then one should be able to solve for one of the quantities a, A, y and N in terms of the rest. This 
formulation is somewhat appealing since it is analogous to setting up one-sided or two-sided fixed- 
width confidence intervals for a with specified 7. However, a does not act like an unknown parameter 
since its value will be known as soon as r-\-s and N are specified. Using methods analogous to those 
of Section 1, one can solve for A for specified a, y and N. For further details the reader is referred 
to the author's (1976) technical report [1]. 


Recall that the proportion of the population covered by (X r-Ni X N -. s+lilf ) is 

U r. s,N = i* (Xn-s+\.n) & \X T , N ) : =U N ^ li + i iN LV r ,jv 

where U w < . . . < U NN denotes the order statistics in a random sample of size N drawn from the 
uniform (0, l)-population. Straightforward computations yield 

(6) EU r . s , N =l-k{N+\)~\ k=r+s 

(7) Var U r , s . N =k(N+l-k)/(N+iy(N+2)=a(l-a)/(N+2) 

where 1— a=EU r , s , A -. 


Letting Var U riSil f<b is equivalent to either 

(8) 5=l-a>(l/2)[l + {l-46(iV+2)} 1 / 2 ] 


5<(1/2)[1-{1-46(AT+2)} 1/2 ] 

where we assume that £><l/4(iV+2). 

EXAMPLE 3: Let 6=.002, iV=99; this implies that either k>72 or k<28. 
When 7, (8 and y are specified one can solve for N in the following manner : 

P(U r ,s, N >(3)>y) 

implies after using equation (1) that 

k<z{Np(l-p)} 1/2 +N(l-p) + l/2, 2=*- 1 (l- 7 ). 

That is, 

Ignoring (2N) -1 and solving for TV we have 

iV 1/2 >-z{/3(l-/3)} 1/2 /(a-/3), <*>{*, 

(9) iV> 2 2 [/3(l-/3)]/(a-/3) 2 . 

EXAMPLE 4: Leta=.95, /3=.9, y=.95. Then 2=— 1.645 and equation'(17) gives 7V>98. 


I thank the referee for a critical reading of the paper. 


[1] Govindarajulu, Z., A Note on Distribution-free Tolerance Limits, University of Kentucky 

Department of Statistics, Technical Report No. 97 (1976). 
[2] Sommerville, P.N. Tables for obtaining nonparametric tolerance limits, Annals of Mathematica 

Statistics, 29, 599-601 (1958). 
[3] Wilks, S.S., Mathematical Statistics (John Wiley and Sons, New York, 1962). 

.U.S. GOVERNMENT PRINTING OFFICE: 1977 240-830/2 1- 


The NAVAL RESEARCH LOGISTICS QUARTERLY is devoted to the dissemination of 
scientific information in logistics and will publish research and expository papers, including those 
in certain areas of mathematics, statistics, and economics, relevant to the over-all effort to improve 
the efficiency and effectiveness of logistics operations. 

Manuscripts and other items for publication should be sent to The Managing Editor, NAVAL 
RESEARCH LOGISTICS QUARTERLY, Office of Naval Research, Arlington, Va. 22217. 
Each manuscript which is considered to be suitable material tor the QUARTERLY is sent to one 
or more referees. 

Manuscripts submitted for publication should be typewritten, double-spaced, and the author 
should retain a copy. Refereeing may be expedited if an extra copy of the manuscript is submitted 
with the original. 

A short abstract (not over 400 words) should accompany each manuscript. This will appear 
at the head of the published paper in the QUARTERLY. 

There is no authorization for compensation to authors for papers which have been accepted 
for publication. Authors will receive 250 reprints of their published papers. 

Readers are invited to submit to the Managing Editor items of general interest in the field 
of logistics, for possible publication in the NEWS AND MEMORANDA or NOTES sections 
of the QUARTERLY. 




JUNE 1977 
VOL. 24, NO. 2 

NAVSO P-1278 



A Priori Error Bounds for Procurement Commodity Aggregation in Logistics 
Planning Models 

A Node Covering Algorithm 

Two-Characteristic Markov-Type Manpower Flow Models 

Joint Pricing and Ordering Policy for Exponentially Decaying Inventory 
with Known Demand 

Estimation of Ordered Parameters from k Stochastically Increasing Distri- 

An M/M/l Queue with Delayed Feedback 

Optimal Dynamic Rules for Assigning Customers to Servers in a Heteroge- 
neous Queuing System 

Computing Bounds for the Optimal Value in Linear Programming 

Solving Multicommodity Transportation Problems Using a Primal Parti- 
tioning Simplex Technique 

Stochastic Transportation Problems and Other Network Related Convex 

Probabilistic Ideas in Heuristic Algorithms for Solving Some Scheduling 

Force-Annihilation Conditions for Variable-Coefficient Lanchester-Type 
Equations of Modern Warfare 

The Divisionalized Firm Revisited — A Comment on Enzer's "The Static 
Theory of Transfer Pricing" 

A Note on the Strategy of Restriction and Degenerate Linear Programming 

A Note on Distribution-Free Tolerance Limits 




W. J. HAYNE 235 

M. A. COHEN 257 

H. J. CHEN 269 







J. G. TAYLOR 3« 


C. A. HOLLOW AY 37} 


Arlington, Va. 22217